Statistics and Probability Letters 83 (2013) 1127–1135
Contents lists available at SciVerse ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Estimation of the Shannon’s entropy of several shifted exponential populations Suchandan Kayal, Somesh Kumar ∗ Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur - 721302, India
article
info
Article history: Received 22 January 2011 Received in revised form 10 January 2013 Accepted 11 January 2013 Available online 20 January 2013
abstract Estimation of the entropy of several exponential distributions is considered. A general inadmissibility result for the scale equivariant estimators is proved. The results are extended to the case of unequal sample sizes. Risk functions of proposed estimators are compared numerically. © 2013 Elsevier B.V. All rights reserved.
Keywords: Entropy Equivariant estimator Inadmissibility Monotone likelihood ratio Brewster–Zidek technique
1. Introduction The concept of entropy was introduced by Clausius, Boltzmann and Gibbs in thermodynamics and statistical mechanics in the nineteenth century as a measure of disorder of a physical system. A major boost to the concept was provided by Shannon (1948) who related it to the theory of communication as a measure of information. Suppose a random variable X has the probability density function fθ (x), θ ∈ Θ . Then the Shannon’s entropy of the random variable X is defined by H (θ ) = Eθ (− ln fθ (X )). Presently the term entropy has applications in such diverse areas as molecular biology, hydrology, computer science and meteorology. For example, molecular biologists use the concept of Shannon’s entropy in the analysis of patterns in gene sequences. In dynamical systems, entropy is used to measure the exponential complexity of the system. In social studies, entropy is used as a measure of the decay of systems such as organizations, social orders or practices. For a detailed account of importance and applications of the principles of entropy in various disciplines one may refer to Cover and Thomas (1999), Adami (2004), Misra et al. (2005), Robinson (2011) and Liu et al. (2011). There have been attempts by several authors for the parametric estimation of entropy. Lazo and Rathie (1978) obtained entropy expressions of various univariate continuous probability distributions. Ahmed and Gokhale (1989) derived the expressions of entropy of several multivariate distributions. In particular, they studied multivariate normal and exponential distributions and obtained uniformly minimum variance unbiased estimator (UMVUE ) of the entropy. The problem of estimating the entropy of a multivariate normal distribution with respect to the squared error loss function has been further investigated by Misra et al. (2005). They showed that the best affine equivariant estimator (BAEE ) is unbiased and is also generalized Bayes. Further improved estimators were obtained dominating the BAEE.
∗
Corresponding author. Tel.: +91 3222283662; fax: +91 3222255303. E-mail addresses:
[email protected] (S. Kayal),
[email protected],
[email protected] (S. Kumar).
0167-7152/$ – see front matter © 2013 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2013.01.012
1128
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
The problem of estimating the Shannon’s entropy in exponential populations is considered here. The exponential distribution can be obtained as a distribution with the maximum entropy when a continuous random variable has a given mean and support on the positive real line. Cover and Thomas (1999) describe an application in atmospheric physics. They consider the distribution of the height of molecules in the atmosphere. Here the average potential energy of molecules is fixed and the gas tends to the distribution with the maximum entropy subject to the restriction that the average potential energy is constant. In fact the density of atmosphere is known to have an exponential distribution. If σ is the scale parameter of the exponential distribution then the expression for the entropy is 1 + ln σ . Therefore in an exponential population, the estimation of entropy is equivalent to estimation of the logarithm of the scale parameter. It was first observed by Stein (1964) that the BAEE of the normal variance is inadmissible. Brown (1968) gave general conditions under which Stein type results can be obtained for scale parameter families. However, his results are not applicable to many situations such as a shifted exponential distribution. Arnold (1970) proved that the BAEE of the scale parameter in a shifted exponential distribution is inadmissible with respect to a squared error loss. Zidek (1973) extended the result of Arnold to a larger class of bowl-shaped loss functions. The estimators of Arnold and Zidek are not smooth. Brewster (1974) derived a smooth improved estimator, however, it does not dominate the BAEE in the whole parameter space. An improvement over the BAEE of the reciprocal of the scale parameter was derived by Sharma (1977). Petropoulos and Kourouklis (2002) derived a class of improved estimators with respect to a scale invariant loss function. Recently Bobotas and Kourouklis (2009) have obtained a new class of improving estimators for the scale parameter in the presence of a nuisance parameter under a scale invariant loss. In particular the result yields a class of estimators improving upon the BAEE of the scale parameter in an exponential population. Kayal and Kumar (2011a) considered the problem of estimating the entropy of an exponential distribution with respect to a linex loss function. For the negative exponential model they proved that the best scale equivariant estimator of the entropy is admissible and minimax. However, for the shifted exponential distribution, due to the presence of nuisance parameter, the sufficient statistic changes and the BAEE of the entropy is shown to be inadmissible (Kayal and Kumar, 2011a). The estimation of the entropy of k (≥2) negative exponential populations was considered by Kayal and Kumar (2011b) with respect to the squared error and linex loss functions. In this paper we consider the estimation of entropy of k (≥2) shifted exponential populations, when they have a common scale parameter σ and different location parameters µ1 , . . . , µk . Note that this model is not covered by the work mentioned in the previous two paragraphs. Exponential distribution is one of the most widely used distributions in describing lifetimes of components, service times in queueing systems, time periods between two successive occurrences in a Poisson process etc. Recently Pal et al. (2006) have demonstrated that real life data sets on stems sizes of male and female species of diecious plants as obtained from Sakai and Burries (1985) are fitted by exponential distributions. Dragulescu and Yakovenko (2001) have shown that individual annual income data in USA is fitted very well by exponential distribution. Here one may consider the parameters µ1 , . . . , µk to denote the income levels below which the tax filing is not required in different states. However, the average income levels may be same due to overall economic policies of the country which is applicable to all citizens. Similarly one may consider service times at check-in counters of k different airlines at different airports. Here due to different starting times, the parameters µ1 , . . . , µk may be different but average service times (once the service has started) may be same due to similar nature of trained service persons and equipment used. In Section 2, we obtain the BAEE for the Shannon’s entropy for our model. A general inadmissibility result for the scale equivariant estimators is proved. Consequently, a new estimator is obtained which dominates the BAEE under the squared error loss function. Further, problems of estimating the entropy are considered in restricted parameter spaces and improved estimators are derived. In Section 3 the results are extended to the case when sample sizes are unequal. A heuristic discussion is added in Section 4. A numerical comparison of the risk values of the proposed estimators is presented in Section 5. 2. The best affine equivariant estimator Let (Xi1 , . . . , Xin ) be a random sample taken from the population Πi , i = 1, . . . , k (k ≥ 2). We assume that the k samples are taken independently. The probability density associated with the population Πi is given by
x − µi 1 exp − , fi (x) = σ σ 0,
if x > µi ,
(1)
otherwise.
The expression of the Shannon’s entropy is H (σ ) = k(1 + ln σ ). We consider an equivalent problem of estimating Q (σ ) = ln σ under the squared error loss L(σ , δ) = (δ − ln σ )2 .
(2)
On the basis of the i-th n sample {Xi1 , . . . , Xin }, (Xi(1) , Yi ) is a complete and sufficient statistic for (µi , σ ), where Xi(1) = min1≤j≤n Xij , Yi = j=1 Xij . Further, we define Zi = Yi − nXi(1) . Then Xi(1) and Zi are independently distributed. Also Xi(1) follows an exponential distribution with location parameter µi and scale parameter σ /n, whereas 2Zi /σ follows a chi-square distribution with 2(n − 1) degrees of freedom (see, for example, Lehmann and Casella, 1998, p. 43). Let k X (1) = (X1(1) , . . . , Xk(1) ) and T = i=1 Zi . Then (X (1) , T ) is complete and sufficient for (µ, σ ), where µ = (µ1 , . . . , µk ).
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
1129
It should be noted that X (1) and T are independently distributed. Further, using the additive property of the chi-square distribution, it can be shown that 2T /σ follows a chi-square distribution with 2k(n − 1) degrees of freedom. The maximum likelihood estimator (MLE ) of Q (σ ) is δML = ln T − ln(kn). We derive the UMVUE of Q (σ ) as δMV = ln T − ψ(k(n − 1)), d (ln Γ (x)). where ψ denotes Euler psi (digamma) function, defined as ψ(x) = dx Consider the transformations ga,bi (xij ) = axij + bi , j = 1, . . . , n, i = 1, . . . , k. Here a is kept the same so as to have the common scale property to be sustained after transformation. Writing b = (b1 , . . . , bk ) and ga,b = (ga,b1 , . . . , ga,bk ), we see that under the transformation ga,b ,
(µ, σ ) → (aµ + b, aσ ),
(X (1) , T ) → (aX (1) + b, aT ).
Consequently, we get ln σ → ln σ + ln a. The loss function (2) is invariant under the group Ga,b of affine transformations ga,b , a > 0, b ∈ Rk , if δ → δ + ln a. The form of an affine equivariant estimator is obtained as
δc (X (1) , T ) = ln T − c
(3)
for any constant c. The following theorem gives the BAEE of Q (σ ). Theorem 1. Under the squared error loss function (2), the BAEE of Q (σ ) is δc0 (X (1) , T ), where c0 = ψ(k(n − 1)). Proof. The risk of the estimators of the form (3) is R(σ , δc ) = E (ln T − c − ln σ )2 , which is minimized for c = E (ln(T /σ )) = ψ(k(n − 1)) = c0 , Hence the result follows.
say.
Remark 1. The BAEE is also the UMVUE. Also using Jensen’s inequality it can be shown that ψ(k(n − 1)) < ln(kn) which means that the MLE underestimates Q (σ ). 2.1. Improving upon the best affine equivariant estimator To get an improvement over the BAEE δc0 , we consider a larger class of estimators. Consider the scale group of transformations Ga = {ga : ga (x) = ax, a > 0}. The problem of estimating Q (σ ) remains invariant with respect to the group Ga . Under the transformation ga , we have
(µ, σ ) → (aµ, aσ ),
(X (1) , T ) → aX (1) , aT )
and therefore, ln σ → ln σ +ln a. It can be also shown that the loss function (2) is invariant under the group Ga if δ → δ+ln a. Therefore we get the form of a scale equivariant estimator as
δφ (W , T ) = ln T + φ(W ),
(4)
where W = (W1 , . . . , Wk ), Wi = Xi(1) /T and φ is a real valued measurable function. A general inadmissibility result for the estimators of the form (4) is proved in the theorem below. Let B1 = {w : w(1) > 0}, B2 = {w : u < exp(φ(w)
+ ψ(kn))}, B3 = {w : w(k) < 0}, u = n ki=1 wi + 1, w(1) = min{w1 , . . . , wk }, w(k) = max{w1 , . . . , wk } and wi = xi(1) /t for i = 1, . . . , k. Also define for a function φ(w) as in (4), k ln n wi + 1 − ψ(kn), if w ∈ B1 B2 B3 Bc2 φ0 (w) = (5) i = 1 φ(w), otherwise. Theorem 2. Let δφ be a scale equivariant estimator of the form (4) and φ0 (w) be as defined in (5). If there exists some (µ, σ ) such that P(µ,σ ) (φ0 (W ) ̸= φ(W )) > 0, then under the squared error loss function (2), the estimator δφ0 dominates δφ . Proof. The risk function of the estimators of the form δφ given in (4) can be written as R(µ, σ , δφ ) = E W R1 (µ, σ , W , δφ ), where R1 (µ, σ , w, δφ ) denotes the conditional risk of δφ given W = w given by R1 (µ, σ , w, δφ ) = E [(δφ − ln σ )2 |W = w]
= E [(ln(T /σ ) + φ(W ))2 |W = w].
(6)
1130
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
We notice that the conditional risk R1 (µ, σ , w, δφ ) in (6) is only a function of the ratio µ/σ . Therefore, without loss of generality we can take σ = 1. Again the conditional risk R1 is a convex function of φ , and the choice of φ minimizing R1 can be obtained as
ˆ µ) = −E (ln T |W = w). φ(w,
(7)
In order to evaluate the term in (7), we derive the conditional distribution of T given W = w . The joint probability density of X (1) and T is nk
f(X (1) ,T ) (x(1) , t ) =
Γ (k(n − 1))
e
− n ki=1 (xi(1) −µi )+t k(n−1)−1
t
,
t ≥ 0, xi(1) ≥ µi , i = 1, . . . , k.
(8)
Now using the transformations w1 = x1(1) /t , . . . , wk = xk(1) /t and t = t, we get the joint density of W and T , as f(W ,T ) (w, t ) =
nk
− n ki=1 (wi t −µi )+t kn−1
Γ k(n − 1)
e
t
,
t ≥ 0, t wi ≥ µi .
To find the marginal density of W , we integrate f(W ,T ) (w, t ) with respect to t. Case (i) Suppose all µi ’s are non-negative, i = 1, . . . , k: In this case, t varies from η1 to ∞, where η1 = max{µ1 /w1 , . . . , µk /wk }. Therefore, the marginal density of W is fW (w) =
nk
∞
e
Γ (k(n − 1))
− n ki=1 (wi t −µi )+t kn−1
t
η1
dt ,
wi > 0.
Consequently, the conditional density of T given W = w is given by fT |W (t |w) =
− n ki=1 (wi t −µi )+t kn−1 t , ∞ − n ki=1 (wi t −µi )+t t kn−1 dt η1 e
e
t > η1 .
Therefore, we get
∞ η1
E (ln T |W = w) =
ln t e−ut t kn−1 dt
∞ η1
e−ut t kn−1 dt
.
Substituting the expression of E (ln T |W = w) in (7), we get
∞ ˆ φ(w, µ) = ln u −
η1′
ln p e−p pkn−1 dp
∞ η′
e−p pkn−1 dp
= ln u − h1 (η1′ ),
say
(9)
1
where η1′ = η1 u. In order to apply the Brewster–Zidek technique (1974) we need to find the supremum and infimum of
ˆ µ) given in (9). To this end, we show that the density function φ(w, e−p pkn−1
∞ η1′
e−p pkn−1 dp
,
η1′ < p < ∞,
has a monotone likelihood ratio property in η1′ and then apply Lemma 3.4.2, in Lehmann and Romano (2009). Now it can be shown that h1 (η1′ ) is a nondecreasing function in η1′ and η1′ lies between 0 to ∞. Thus we get sup h1 (η1′ ) = +∞ and
inf h1 (η1′ ) = ψ(kn). η1′
η1′
Therefore, from (9) we get
ˆ sup φ(w, µ) = ln u − ψ(kn) µ
and
ˆ inf φ(w, µ) = −∞. µ
Case (ii) Suppose all µi ’s are negative, i = 1, . . . , k: In this case several possibilities in wi ’s may arise, which are (a) all wi ’s are non-negative, (b) all wi ’s are negative and (c ) some wi ’s are non-negative and remaining are negative. In the following discussion, we investigate all these cases in detail. (a) When all wi ’s are non-negative, the range of t is from 0 to ∞. Therefore the marginal density of W is fW (w) =
nk
Γ (k(n − 1))
∞
e 0
− n ki=1 (wi t −µi )+t kn−1
t
dt ,
wi > 0, i = 1, . . . , k.
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
1131
Consequently, the conditional density of T given W = w can be obtained as fT |W (t |w) =
− n ki=1 (wi t −µi )+t kn−1 e t ∞ − n ki=1 (wi t −µi )+t kn−1 0
e
t
,
t > 0.
dt
Therefore, the conditional expectation of ln T given W = w is given by
∞ E (ln T |W = w) =
ln t e−ut t kn−1 dt
0
∞ 0
e−ut t kn−1 dt
.
Substituting the expression of E (ln T |W = w) in (7) and integrating, we get
ˆ µ) = ln u − ψ(kn). φ(w,
(10)
(b) Now we consider the case when wi ’s are negative: In this case, t varies from 0 to η2 , where η2 = min{µ1 /w1 , . . . , µk /wk }. Similar to the Case (a) we derive the conditional expectation of ln T given W = w as
η2 E (ln T |W = w) =
ln t e−ut t kn−1 dt
0
η2 0
e−ut t kn−1 dt
.
When u > 0, we have from (7)
ˆ φ(w, µ) = ln u −
η2′
ln p e−p pkn−1 dp
0
η2′ 0
e−p
pkn−1 dp
= ln u − h2 (η2′ ),
say
where η2′ = η2 u. Using monotone likelihood ratio property as in Case (i), we can show that h2 (η2′ ) is a nondecreasing function in η2′ . Thus we get sup h2 (η2′ ) = ψ(kn) and η2′
inf h1 (η2′ ) = −∞. η2′
Therefore,
ˆ µ) = +∞ and sup φ(w, µ
ˆ inf φ(w, µ) = ln u − ψ(kn). µ
Similarly, when u < 0, we get
ˆ sup φ(w, µ) = +∞ and µ
ˆ inf φ(w, µ) = −∞. µ
(c) For the case when some wi ’s are non-negative and the remaining are negative, we show that the results are permutation invariant. Let (i1 , . . . , ik ) be a permutation of (1, . . . , k). We assume wij ≥ 0 for j = 1, . . . , r and wij < 0 for j = r + 1, . . . , k, r = 1, . . . , k − 1. Thus the range of t is from 0 to η3 , where η3 = min{µir +1 /wir +1 , . . . , µik /wik }. In this case the conditional expectation of ln T given W = w is obtained as
η3 E (ln T |W = w) =
0
ln t e−ut t kn−1 dt
η3 0
e−ut t kn−1 dt
.
Using the arguments as in Part (b), we get the supremum and infimum of φˆ given in (7), as
ˆ sup φ(w, µ) = +∞, µ
and
ˆ inf φ(w, µ) = ln u − ψ(kn), µ
when u > 0; and
ˆ sup φ(w, µ) = +∞ and µ
ˆ inf φ(w, µ) = −∞, µ
when u < 0. Case (iii) Some of µi ’s are non-negative and remaining are negative: ˆ In this case we show that finding the supremum and infimum of φ(w, µ) is invariant under different permutations in µi ’s. We consider the case that within all µi ’s any r (r = 1, . . . , k − 1) terms are non-negative and remaining (k − r ) terms are negative. Let (i1 , . . . , ik ) be a permutation of (1, . . . , k) so that µij ≥ 0 for j = 1, . . . , r, and µij < 0 for j = r + 1, . . . , k. Therefore, when µi1 , . . . , µir ≥ 0, all corresponding wij ’s are also non-negative, for j = 1, . . . , r, whereas
1132
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
when µir +1 , . . . , µik < 0 there are several possibilities: all (k − r ) wi ’s are non-negative, all (k − r )wi ’s are negative, some
ˆ of wi ’s are non-negative and remaining are negative. To find the supremum and infimum of φ(w, µ) given in (7) we use the technique used in Case (i). (a) Let us consider the case wi1 , . . . , wir , . . . , wik > 0: Under this case the range of t is from η11 = max{µi1 /wi1 , . . . , µir /wir } to ∞. The conditional expectation of ln T given W = w is given by ∞ E (ln T |W = w) =
η11
ln t e−ut t kn−1 dt
∞
η11
e−ut t kn−1 dt
.
Hence, we get
ˆ sup φ(w, µ) = ln u − ψ(kn)
and
µ
ˆ inf φ(w, µ) = −∞. µ
(b) Suppose wi1 , . . . , wir > 0 and wir +1 , . . . , wik < 0: The range of t is from η12 = max{µi1 /wi1 , . . . , µir /wir } to η13 = min{µir +1 /wir +1 , . . . , µik /wik }. Therefore, the conditional expectation of ln T given W = w can be obtained as
η12 E (ln T |W = w) =
η11
ln t e−ut t kn−1 dt
η12 η11
e−ut t kn−1 dt
.
It can be shown as before that
ˆ µ) = +∞ and sup φ(w, µ
ˆ inf φ(w, µ) = −∞. µ
(c) Let wi1 , . . . , wir > 0 and within (k − r ), some wi ’s are non-negative and remaining are negative: In this case we again show that the results are also permutation invariant. We consider the case: let (j1 , . . . , jk−r ) is a permutation of (ir +1 , . . . , ik ). Suppose wj1 , . . . , wjm ≥ 0 and wjm+1 , . . . , wjk−r < 0. The range of t is from max{µi1 /wi1 , . . . , µir /wir , µjr +1 /wjr +1 , . . . , µjm /wjm } to min{µjm+1 /wjm+1 , . . . , µjk−r /wjk−r }. Arguing as earlier, it can be shown that
ˆ sup φ(w, µ) = +∞ and µ
ˆ inf φ(w, µ) = −∞. µ
An application of the Brewster–Zidek technique (1974) on the function R1 (µ, σ , w, δφ ) then completes the proof of the theorem. As a consequence of this theorem we get the following corollary. Corollary 1. Let C1 = {w : u < ed } and d = ψ(kn) − ψ(k(n − 1)). The BAEE δc0 of Q (σ ) is inadmissible and dominated by the estimator given by
δIB =
ln(uT ) − ψ(kn),
if w ∈ B1
ln T − ψ(k(n − 1)),
otherwise.
C1
B3
C1c ,
Remark 2. We also study the entropy estimation problem when it is known a priori that all µi ’s are bounded below. Such a situation may arise when the minimum guarantee time of components is known to be more than a pre-specified constant due to physical constraints. In this case one may take without loss of generality that µ(1) ≥ 0, where µ(1) = min{µ1 , . . . , µk }. Here the MLE of Q (σ ) is same as the MLE obtained for unrestricted parameter space. The inadmissibility of the BAEE δc0 of Q (σ ) can be established using the steps of Case (i) of the proof of the Theorem 2. The improved estimator is given by
δIB = +
ln(uT ) − ψ(kn), ln T − ψ(k(n − 1)),
if w ∈ C1 , otherwise.
Remark 3. We have also considered the entropy estimation when, contrary to the case in Remark 2, the guarantee times are known to be bounded from above. Here one may assume a priori that µ(k) < 0, where µ(k) = max{µ1 , . . . , µk }. In this case,
0 0 the MLE of Q (σ ) gets modified as δRM = ln T 0 − ln(kn), where T 0 = i=1 (Yi − nXi(1) ), Xi(1) = min{0, Xi(1) }, i = 1, . . . , k. This is the restricted maximum likelihood estimator (RMLE ) of the entropy. Further, the inadmissibility of the BAEE δc0 is proved using the steps used in Case (ii) of the proof of the Theorem 2. Let C2 = {w : w(r ) < 0}, C3 = {w : w(r +1) > 0}. The improved estimator is then
k
δIB = −
ln(uT 0 ) − ψ(kn),
if w ∈ B1
ln T − ψ(k(n − 1)),
otherwise.
B3
C1c
C2
C3
C1c
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
1133
3. Unequal sample sizes The results of the previous section can be extended to the case when random samples with unequal sample sizes are drawn from k exponential populations. The proofs, though somewhat more complicated, are similar to those of the results in Section 2, and hence are omitted. For the sake of completeness, the notation and results have been stated here in full detail. Suppose (Xi1 , . . . , Xini ), i = 1, . . . , k be independent random sample drawn from the populations Π1 , . . . , Πk respectively with pdf of the i-th population given by (1). On the basis of the i-th sample {Xi1 , . . . , Xini }, (Xi∗(1) , Yi∗ ) is a
ni
complete and sufficient statistic for (µi , σ ) where Xi∗(1) = min1≤j≤ni Xij , Yi∗ =
k ni
j =1
Xij . Let, X(∗1) = (X1(1)∗ , . . . , Xk∗(1) ), T ∗ =
∗ ∗ ∗ ∗ ∗ j=1 (Xij − Xi(1) ), and N = i=1 ni . Therefore, (X(1) , T ) is a complete and sufficient statistic for (µ, σ ). X(1) and T i =1 are independently distributed. Also Xi∗(1) follows exponential distribution with location parameter µi and scale parameter σ /ni and 2T ∗ /σ follows Chi-square distribution with 2(N − k) degrees of freedom. The MLE and the UMVUE of Q (σ ) are ∗ ∗ δML = ln T ∗ − ln N and δMV = ln T ∗ − ψ(N − k) respectively. The problem under study is also invariant with respect to Ga,b , the group of the affine transformations. The form of the affine equivariant estimator will be δc∗ (X(∗1) , T ∗ ) = ln T ∗ − c for some real value constant c. In the following theorem we get the BAEE.
k
Theorem 3. Under the squared error loss function (2), the BAEE of Q (σ ) is δc∗∗ (X(∗1) , T ∗ ), where c0∗ = ψ(N − k). 0
As in Section 2, we can obtain the form of the scale equivariant estimator of Q (σ ) as
δφ∗ ∗ (W ∗ , T ∗ ) = ln T ∗ + φ ∗ (W ∗ ),
(11)
where W ∗ = (W1∗ , . . . , Wk∗ ) and Wi∗ = Xi∗(1) /T ∗ . Suppose B∗1 = {w ∗ : w(∗1) > 0}, B∗2 = {w ∗ : u < exp(φ(w ∗ ) + ψ(kn))}, k ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ B∗3 = {w ∗ : w(∗k) < 0}, u∗ = i=1 ni wi + 1, w(1) = min{w1 , . . . , wk }, w(k) = max{w1 , . . . , wk } and wi = xi(1) /t for ∗ i = 1, . . . , k. For a function φ in (11), define
φ0 (w ) = ∗
∗
k ln
ni wi + 1 ∗
− ψ(N ),
B∗2
B∗3
B∗2
c
(12)
i =1
φ (w ), ∗
if w ∗ ∈ B∗1
∗
otherwise.
The following theorem proves a general inadmissibility result for the estimators of the form (11). Theorem 4. Let δφ∗ ∗ be a scale equivariant estimator of the form (11) and φ0∗ (w ∗ ) be as defined in (12). If there exists some (µ, σ ) such that P(µ,σ ) (φ0∗ (W ∗ ) ̸= φ ∗ (W ∗ )) > 0, then under the squared error loss function (2), the estimator δφ∗ ∗ dominates δφ∗ ∗ . 0
In the following corollaries, the improved estimator of the BAEE is given for various cases. Corollary 2. The BAEE δc∗∗ of Q (σ ) is inadmissible and dominated by the estimator given by 0
δIB = ∗
ln(u∗ T ∗ ) − ψ(N ),
if w ∗ ∈ B∗1
ln T ∗ − ψ(N − k),
otherwise,
C1∗
B∗3
c
C1∗
d∗
where C1∗ = {w ∗ : u∗ < e } and d∗ = ψ(N ) − ψ(N − k). Corollary 3. The BAEE δc∗∗ of Q (σ ) is inadmissible when µ(1) ≥ 0 and dominated by the estimator given by 0
δIB∗ + =
ln(u T ) − ψ(N ),
if w ∗ ∈ C1∗
ln T ∗ − ψ(N − k),
otherwise.
∗ ∗
Corollary 4. The estimator
δIB = ∗−
∗
ln(u∗ T ∗ ) − ψ(N ),
if w ∗ ∈ B∗1
ln T ∗ − ψ(N − k),
otherwise,
∗
where C2 = {w :
w(∗r )
B∗3
C1∗
c
C2∗
C3∗
C1∗
c
< 0}, C3 = {w∗ : w(∗r +1) > 0}, r = 1, . . . , k − 1 dominates the BAEE δc∗∗ of Q (σ ) when µ(k) < 0. ∗
0
1134
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
a
c
b
0.04
0.1
0
µ1
µ1
µ
-1 1
µ1
f
e
2
µ
µ
2
µ
2
R
R
R
d
0.5
µ
2
0
µ
1
2
-1 -0.5
0.098
2
R
R
R
0.102
µ1
µ1
µ1
g
i
µ
2
µ
2
R
R
R
h
µ
2
µ1
µ1
µ1
j
k
2
µ
2
µ1
µ
µ
µ1
2
R
R
R
l
µ1
+ − Fig. 1. The risk plot of the estimators δIB , δIB , δIB and δRM for n = (4, 6, 8). Graphs (a, b, c ) for δIB , Graphs (d, e, f ) for δIB+ , Graphs (g , h, i) for δIB− and Graphs (j, k, l) for δRM respectively.
4. Heuristic discussion We have considered the problem of estimating entropy of k shifted exponential populations with a common scale but different locations. The entropy expression is related to the logarithm of the scale parameter. Stein (1964) first showed that the best equivariant estimator of normal variance is inadmissible. Later this phenomenon was observed for some other
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
1135
scale parameter families including exponential distribution (see Brown, 1968 and Arnold, 1970). Misra et al. (2005) obtained Stein type and Brewster–Zidek type estimators for the entropy for a multivariate normal population. In this paper we derive dominating estimators over the BAEE for the entropy of k shifted exponential populations. The model is important as the structure of the sufficient statistics gets modified. 5. Numerical comparisons + − In this section we compare numerically the risk performance of the improved estimators δIB , δIB and δIB with the BAEE δc0 . It is noticed that for all cases of µi ’s the risk differences become small for large values of n. For n ≥ 100 the risk values are same up to six decimal places. For the purpose of presentation of the numerical study, we have taken n = 4, 6, 8, 10, 15, 20, 25, 30 and 50 and k = 2. The risk values of the proposed estimators are calculated using simulations based on 10 000 samples of size n. Since the risk functions of the estimators are functions of (µ1 /σ , . . . , µk /σ ), we take σ = 1 without loss of generality. The results of the numerical study are presented through graphs. The graphs corresponding to values of n = 4, 6 and 8 are presented in Fig. 1 in this paper, whereas for values of n = 10, 15, 20, 25, 30 and 50, they are placed on the website: http://www.facweb.iitkgp.ernet.in/∼smsh/graph.pdf. The following observations are made based on the risk values. (a) Under the squared error loss function the risk values of the MLE δML are 0.320865, 0.161249, and 0.104970 and that of the BAEE δc0 are 0.178992, 0.104975, and 0.075129 for n = 4, 6, 8 respectively. Graphs (a), (b), (c ) in the Fig. 1 represent the risk plot of the estimator δIB . We observe that for different values of n the improved regions of the estimator δIB over δc0 are different. Keeping µ1 fixed, if we decrease the magnitude of µ2 , then margin of improvement is more. It is also noticed that we get considerable improvement when both µ1 and µ2 are close to zero. In this case, the region of improvement is approximately |µ1 | ≤ 0.5 and |µ2 | ≤ 0.5. The maximum improvement observed is about 12%. (b) When both µ1 and µ2 are non-negative, the risk values of the estimators are plotted in graphs (d), (e), (f ) in the + Fig. 1. For large values of µ1 and µ2 , approximately (≥1), δIB takes the value of the risk equal to the R(δc0 ). + For the values of µ1 and µ2 approaching towards zero risk of δIB decreases and before 0, it stops decreasing and starts increasing. The maximum improvement observed is about 12%. (c) When both µ1 and µ2 are negative, graphs (g ), (h), (i) and (j), (k), (l) in the Fig. 1 represent the risk plot of the − − estimators δIB and δRM respectively. From the numerical risk values it is observed that risk values of δRM and δIB decrease − when both µ1 and µ2 increase. The performance of δRM is always better than that of δML . We also see that the estimator δIB always performs better than δRM . The maximum improvement observed is about 27%.
Acknowledgments The authors thank the reviewers and a co-editor-in-chief for their valuable suggestions which have considerably improved the content and the presentation of the paper. References Adami, C., 2004. Information theory in molecular biology. Phys. Life Rev. 1, 3–22. Ahmed, N.A., Gokhale, D.V., 1989. Entropy expressions and their estimators for multivariate distributions. IEEE Trans. Inf. Theory 35, 688–692. Arnold, B.C., 1970. Inadmissibility of the usual scale estimate for a shifted exponentail distribution. J. Amer. Statist. Assoc. 65, 1260–1264. Bobotas, P., Kourouklis, S., 2009. Strawderman-type estimators for a scale parameter with application to the exponential distribution. J. Statist. Plann. Inference 139, 3001–3012. Brewster, J.F., 1974. Alternative estimators for the scale parameter of the exponential distribution with unknown location. Ann. Statist. 2, 553–557. Brewster, J.F., Zidek, J.V., 1974. Improving on equivariant estimators. Ann. Statist. 2, 21–38. Brown, L.D., 1968. Inadmissibility of the usual estimators of scale parameters in problems with unknown location and scale parameters. Ann. Math. Statist. 39, 29–48. Cover, T.M., Thomas, J.A., 1999. Elements of Information Theory. Wiley, New York. Dragulescu, A., Yakovenko, V.M., 2001. Evidence for the exponential distribution of income in the USA. Eur. Phys. J. B 20, 585–589. Kayal, S., Kumar, S., 2011a. Estimating entropy of an exponential population under linex loss function. J. Indian Statist. Assoc. 49, 91–112. Kayal, S., Kumar, S., 2011b. On estimating the Shannon entropy of several exponential populations. Int. J. Stat. Econ. 7, 42–52. Lazo, A.C.G., Rathie, P.N., 1978. On the entropy of continuous probability distributions. IEEE Trans. Inf. Theory 24, 120–122. Lehmann, E.L., Casella, G., 1998. Theory of Point Estimation, second ed. Springer, New York. Lehmann, E.L., Romano, J.P., 2009. Testing Statistical Hypotheses. Springer, New York. Liu, Y., Liu, C., Wang, D., 2011. Understanding atmospheric behaviour in terms of entropy: a review of applications of the second law of thermodyanamics to meteorology. Entropy 13, 211–240. Misra, N., Singh, H., Demchuk, E., 2005. Estimation of the entropy of a multivariate normal distribution. J. Multivariate Anal. 92, 324–342. Pal, N., Jin, C., Lim, W., 2006. Handbook of Exponential and Related Distributions for Engineers and Scientists. Chapman and Hall/CRC, Boca Raton. Petropoulos, C., Kourouklis, S., 2002. A class of improved estimators for the scale parameter of an exponential distribution with unknown location. Comm. Statist. Theory Methods 31, 325–335. Robinson, D.W., 2011. Entropy and uncertainty. Entropy 10, 493–506. Sakai, A.K., Burries, T.A., 1985. Growth in male and female aspen clones: a twenty-five year longitudinal study. Ecology 66, 1921–1927. Shannon, C., 1948. The mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423. Sharma, D., 1977. Estimation of the reciprocal of the scale parameter in a shifted exponential distribution. Sankhya¯ Ser. A 39, 203–205. Stein, C., 1964. Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Ann. Inst. Statist. Math. 16, 155–160. Zidek, J.V., 1973. Estimating the scale parameter of the exponential distribution with unknown location. Ann. Statist. 1, 264–278.