Journal of Computational and Applied Mathematics 261 (2014) 95–102
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
Estimation of entropy using random sampling Amer Ibrahim Al-Omari ∗ Department of Mathematics, Faculty of Science, Al al-Bayt University, Mafraq 25113, Jordan
article
abstract
info
Article history: Received 4 June 2013 Received in revised form 8 September 2013
In this paper, three new entropy estimators of continuous random variables are proposed using simple random sampling (SRS), ranked set sampling (RSS) and double ranked set sampling (DRSS) techniques. The new estimators are obtained by modifying the estimators suggested by Noughabi and Arghami (2010) and Ebrahim et al. (1994). In terms of the root mean square error (RMSEs) and bias values, a numerical comparison is considered to compare the suggested estimators with Vasicek’s (1976) estimator. Our results reveal that the suggested estimators have smaller mean squared error than Vasicek’s estimator. Also, the suggested estimators under double ranked set sampling are more efficient than other suggested estimators based on SRS and RSS. © 2013 Elsevier B.V. All rights reserved.
Keywords: Entropy Root mean square error Simple random sampling Ranked set sampling Double ranked set sampling
1. Introduction Assume that the random variable X has a continuous probability density function (pdf) f (x) and cumulative distribution function (cdf) F (x). Shannon [1] defined the differentiable entropy H (f ) of the random variable X as H (f ) = −
∞
f (x) log f (x)dx.
(1)
−∞
The entropy is a measure of uncertainty and dispersion. Many authors have been considered the problem of estimating the entropy of the continuous random variables. See for example [2–5]. Vasicek [4] showed that the estimator in (1) can be written as H (f ) =
1
log
0
d dp
F −1 (p)
dp.
(2)
The estimator given in (2) is estimated by Vasicek by replacing the cdf F (x) with the empirical cdf Fn (x), and using a difference operator instead of the differential operator. Therefore, the derivative of F −1 (p) is estimated by a function of the order statistics. Let X1 , X2 , . . . , Xn be a simple random sample of size n from F (x), and let X(1) , X(2) , . . . , X(n) be the order statistics of this sample. Vasicek [4] suggested an estimator of H as HV(m,n) =
n 1
n i =1
Log
n 2m
X(i+m) − X(i−m)
,
where m is a positive integer known as the window size, m <
∗
Tel.: +962 777906433. E-mail addresses:
[email protected],
[email protected].
0377-0427/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cam.2013.10.047
(3) n 2
and X(i) =
X(1) , X(n) ,
if i < 1 if i > n.
96
A. Ibrahim Al-Omari / Journal of Computational and Applied Mathematics 261 (2014) 95–102
Vasicek showed that HV(m,n) converges in probability to H (f ) as n → ∞, m → ∞, and m/n → 0. Van Es [6] suggested an entropy estimator as n−m
n + 1
1
HVE(m,n) =
n − m i=1
m
X(i+m) − X(i)
+
n 1 k=m
k
+ log
m n+1
,
(4)
and under some conditions, proved the consistency and asymptotic normality of the estimator. n Ebrahimi et al. [2] adjusted the weight 2m in [4] estimator to assign smaller weights and proposed the following estimator HE(m,n) =
n 1
n i =1
Log
n ci m
X(i+m) − X(i−m)
,
(5)
where
i−1 1+ , m ci = 2, 1 + n − i ,
1 ≤ i ≤ m, m + 1 ≤ i ≤ n − m,
n − m + 1 ≤ i ≤ n. m Based on simulations, Ebrahimi et al. [2] showed that their estimator has a smaller bias and mean square errors. Also, they proved that HE(m,n) converges in probability to H (f ) as n → ∞, m → ∞ and m/n → 0. Noughabi and Arghami [7] suggested a modified version of Ebrahimi et al. [2] entropy estimator and proved that it performs better than the Vasicek [4] and Ebrahimi et al. [2] estimators. Their proposed estimator is given by HNA(m,n) =
n 1
n i=1
Log
n X(i+m) − X(i−m) , ci m
(6)
where 1, 2, 1,
ci =
1 ≤ i ≤ m, m + 1 ≤ i ≤ n − m, n − m + 1 ≤ i ≤ n. P
They proved the consistency of the estimator, HNA(m,n) −→ H (f ) as n → ∞, m → ∞, m/n → 0. Noughabi and Noughabi [5] suggested a new estimator of entropy of an unknown continuous probability density function as HNN(m,n) = −
n 1
n i=1
Log {si (n, m)} ,
(7)
where
s i ( n, m ) =
fˆ X(i) , 2m/n
1 ≤ i ≤ m,
X(i+m) − X(i−m) ˆ f X(i) ,
,
m + 1 ≤ i ≤ n − m,
and
n − m + 1 ≤ i ≤ n,
∞
fˆ (Xi ) =
n 1
nh j=1
k
Xi − Xj h
,
P
where h is bandwidth and k is a kernel function satisfying −∞ k(x)dx = 1. They proved that HNN(m,n) −→ H (f ) as n → ∞, m → ∞, m/n → 0. Note that the kernel function in [5] is chosen to be the standard normal distribution and the bandwidth h is chosen to be the normal smoothing formula, h = 1.06 sn−1/5 , where s is the sample standard deviation. Correa [3] proposed a modification of Vasicek’s estimator with a smaller mean square error by
HCmn
i+m
¯ j − i X − X ( ) (j) (i) n j=i−m 1 =− log , i +m 2 n i=1 n X(j) − X¯ (i)
(8)
j=i−m
i+m
where X¯ (i) = 2m1+1 j=i−m X(j) . For more about entropy estimators, see [8–13]. The rest of this paper is organized as follows. In Section 2, the suggested estimators using SRS, RSS and DRSS are introduced. Numerical comparisons between the suggested estimators and that of Vasicek [4] are given in Section 3. Finally, Section 4 summarizes our conclusions.
A. Ibrahim Al-Omari / Journal of Computational and Applied Mathematics 261 (2014) 95–102
97
2. The proposed entropy estimators In this section, the suggested estimators are explained based on SRS, RSS, and DRSS. These estimators are based on utilizing the coefficients in [7,2]. n X(i+m) − X(i−m) is not good formula for the slope when i ≤ m or i ≥ n − m + 1. Therefore, It is clear that si (m, n) = 2m to overcome this problem at these points we suggest a new modification to the numerator and/or the denominator. 2.1. Using SRS Let X1 , X2 , . . . , Xn be a simple random sample (SRS) of size n from a distribution function F (x). Following Noughabi and Arghami [7], and Ebrahim et al. [2], the first suggested estimator of entropy of an unknown continuous probability density function f is given by AHESRS(m,n) =
n 1
n i =1
Log
n X(i+m) − X(i−m) , ci m
(9)
where
1+ ci = 2, 1 +
1 2 1 2
,
1 ≤ i ≤ m, m + 1 ≤ i ≤ n − m,
,
n − m + 1 ≤ i ≤ n,
and X(i−m) = X(1) for i ≤ m and X(i+m) = X(n) for i ≥ n − m. Comparing (3) with (9), we have AHESRS(m,n) =
=
n 1
n i=1 n 1
n i=1
Log
n ci m
Log
X(i+m) − X(i−m)
2n 2ci m
= HVSRS(m,n) + = HVSRS(m,n) +
n i =1
n
X(i+m) − X(i−m)
n 1
1
Log
m
2 ci
Log
i =1
2
4
n
3
4 3
n
+
i=n−m+1
= HVSRS(m,n) + m Log .
Log
4
3 (10)
Remark. The entropy H (fnME ) of an empirical maximum entropy density fnME which is related to HVSRS(1,n) and AHESRS(1,n) can be computed following Theil [11] as: H (fnME ) = HVSRS(1,n) +
2 − 2 log 2
= AHESRS(1,n) − = AHESRS(1,n) −
n 2 n 2
log
4 3 4
+
2 − 2 log 2 n 2 − 2 log 2
log + n n 3 2 4 = AHESRS(1,n) + 1 − log − log 2 . n 3 Theorem 1. Let X1 , X2 , . . . , Xn be a simple random sample from distribution function F (x). Then AHESRS(m,n) > HVSRS(m,n) . Proof. From (10) we have AHESRS(m,n) = HVSRS(m,n) +
2 n
4 m Log . 3
Since 2n m Log 43 > 0, then the proof is completed.
98
A. Ibrahim Al-Omari / Journal of Computational and Applied Mathematics 261 (2014) 95–102
2.2. Using RSS The ranked set sampling method is suggested by McIntyre [14] for estimating the population mean of pasture and forage yields. The RSS can be described as follows: select n simple random samples each of size n from the target population and visually rank the units within each sample with respect to a variable of interest. From the ith sample (i = 1, 2, . . . , n) of n units, the ith smallest raked unit is measured. The method is repeated h times if needed to increase the sample size to hn units. Assume that the variable of interest X has a probability density function f (x) and a cumulative distribution function F (x) with mean µ, and variance σ 2 . Let f(i) (x) be the pdf of the ith order statistic X(i) (1 ≤ i ≤ n), of a random sample of size n. Let Xj(i) denote the ith order statistic from the jth sample (j = 1, 2, . . . , n). Then the measured RSS units are denoted by X1∗(1) , X2∗(2) , . . . , Xn∗(n) . The cdf of X(i) is given by F(i) (x) =
n n
j
j =i
F j (x) [1 − F (x)]n−j ,
−∞ < x < ∞,
with the pdf defined as
n−1 F i−1 (x) [1 − F (x)]n−i f (x), f(i) (x) = n i−1
−∞ < x < ∞.
The mean and the variance of the ith order statistic, X(i) , are
µ(i) =
∞
x f(i) (x)dx,
and σ
2 (i)
−∞
∞
x − µ(i)
=
2
f(i) (x)dx, respectively.
−∞
The SRS estimator of the population mean is given by X¯ SRS =
1 n
n
i=1
2 Xi , with variance Var X¯ SRS = σn . The RSS estimator
σ 1 ∗ ¯ of the population mean is defined as X¯ RSS = n i=1 µi(i) − µ . i=1 Xi(i) , with variance given by Var XRSS = n − n2 Takahasi and Wakimoto [15] showed that the efficiency of RSS relative to SRS for estimating the population mean is
n 1
1 ≤ eff X¯ RSS , X¯ SRS =
( (
Var X¯ SRS Var X¯ RSS
) ≤ )
n +1 . 2
2
n
2
The lower bound is attained if and only if the underlying distribution is degenerate,
while the upper bound is attained if and only if the underlying distribution of the data is rectangular. Further, n it is shown n 1 that the parent pdf f (x) and the corresponding mean can be expressed as f (x) = 1n i=1 f(i) (x), and µ = n i=1 µ(i) . For more about RSS, see [16–22]. Let X(∗1) , X(∗2) , . . . , X(∗n) be a ranked set sample of size n selected from the distribution of interest. The Vasicek [4] estimator using RSS is defined as HVRSS(m,n) =
n 1
Log
n
X(∗i+m) − X(∗i−m)
,
n i =1 2m see [23] for more details. The second suggested estimator of entropy using RSS can be defined as AHERSS(m,n) =
n 1
n i =1
Log
(11)
n ∗ X(i+m) − X(∗i−m) , ci m
(12)
where
1 + ci = 2, 1 + where
X(∗i−m)
=
1 2 1 2
,
1 ≤ i ≤ m, m + 1 ≤ i ≤ n − m,
,
X(∗1)
n − m + 1 ≤ i ≤ n, for i ≤ m and X(∗i+m) = X(∗n) for i ≥ n − m. Comparing (11) with (12), we have
AHERSS(m,n) =
n 1
n i=1
Log
n ∗ X(i+m) − X(∗i−m) ci m
= HVRSS(m,n) +
n 1
n i =1
Log
2
4
n
3
2 ci
= HVRSS(m,n) + m Log . Theorem 2. Let X(∗1) , X(∗2) , . . . , X(∗n) be a RSS of size n from a distribution function F (x). Then AHERSS(m,n) > HVRSS(m,n) . Proof. The proof is found directly by using (13).
(13)
A. Ibrahim Al-Omari / Journal of Computational and Applied Mathematics 261 (2014) 95–102
99
2.3. Using DRSS Al-Saleh and Al-Kadiri [24] suggested a double ranked set sampling (DRSS) method for estimating the population mean. The DRSS can be described as follows (1) Randomly select n2 samples each of size n from the target population. (2) Apply the RSS method on the n2 samples in Step 1. This step yields n samples each of size n. (3) Reapply the RSS method again on the n samples obtained in Step 2 to obtain a sample of size n from the DRSS data. The cycle can be repeated h times if needed to obtain a sample of size hm units. ∗∗ ∗∗ Let X(∗∗ 1) , X(2) , . . . , X(n) be a DRSS sample of size n. The Vasicek estimator using DRSS is defined as
HVDRSS(m,n) =
n 1
n i=1
Log
n 2m
∗∗ X(∗∗ i+m) − X(i−m)
,
(14)
see [23]. The proposed DRSS estimator of entropy is defined as n 1
AHEDRSS(m,n) =
n i=1
Log
n ∗∗ , X(i+m) − X(∗∗ i−m) ci m
(15)
where
1+ ci = 2, 1 +
1 2 1 2
,
1 ≤ i ≤ m, m + 1 ≤ i ≤ n − m,
,
n − m + 1 ≤ i ≤ n,
∗∗ ∗∗ ∗∗ and X(∗∗ i−m) = X(1) for i ≤ m and X(i+m) = X(n) for i ≥ n − m. Comparing (14) with (15), we have
AHEDRSS(m,n) =
n 1
n i=1
Log
n ∗∗ X(i+m) − X(∗∗ i−m) ci m
= HVDRSS(m,n) +
n 1
n i=1
Log
2
4
n
3
2 ci
= HVDRSS(m,n) + m Log .
(16)
∗∗ ∗∗ Theorem 3. Let X(∗∗ 1) , X(2) , . . . , X(n) be a DRSS from a distribution function F (x). Then AHEDRSS(m,n) > HVDRSS(m,n) .
Proof. The proof is found directly by using (16). As we see that the entropy estimators are functions of order statistics, then the entropy estimators using RSS and DRSS involves ordering the RSS units; for more about order RSS see [25]. The following theorem proves the consistency of the suggested estimators AHESRS(m,n) , AHERSS(m,n) and AHEDRSS(m,n) . Theorem 4. Let Ω be the class of continuous densities with finite entropies and let X1 , X2 , . . . , Xn be a random sample from f ∈ Ω . If n → ∞, m → ∞ and m/n → 0, then P
(1) AHESRS(m,n) −→ H (f ). P
(2) AHERSS(m,n) −→ H (f ). P
(3) AHEDRSS(m,n) −→ H (f ). P
Proof. To prove (1), Vasicek [4] showed that HVSRS(m,n) −→ H (f ) and from (10) we get the proof. The same approach can be used to prove (2) and (3) in the theorem based on the RSS and DRSS sample units using (13) and (16), respectively. 3. Numerical comparison between HV(m,n) and AHE(m,n) estimators In this section, a simulation study in conducted to compare the suggested entropy estimators with the Vasicek [4] entropy estimator using SRS, RSS and DRSS methods. The estimators are compared in terms of their root mean square errors (RMSEs) and bias values. 10 000 samples of sizes n = 10 (m = 2, 3), n = 20 (m = 4, 5, 6) and n = 30 (m = 7, 8, 9, 10, 11) are generated from the uniform, exponential and the standard normal distributions. Simulation results are presented in Tables 1–5. The problem of choosing the optimal values of m for given value n is still open in the field of entropy estimation.
100
A. Ibrahim Al-Omari / Journal of Computational and Applied Mathematics 261 (2014) 95–102
Table 1 Root mean square error and bias values of the entropy estimators HV(m,n) and AHE(m,n) for the uniform distribution with H (f ) = 0 using SRS and RSS. n
m
HVSRS(m,n)
AHESRS(m,n)
HVRSS(m,n)
AHERSS(m,n)
Bias
RMSE
Bias
RMSE
Bias
RMSE
Bias
RMSE
10
2 3
−0.415135 −0.422613
0.452358 0.453818
−0.298609 −0.249056
0.350332 0.298944
−0.304078 −0.327681
0.329233 0.343991
−0.189664 −0.154894
0.228762 0.186380
20
4 5 6
−0.260596 −0.276800 −0.299321
0.274678 0.288985 0.310256
−0.144016 −0.133179 −0.125960
0.167779 0.157805 0.150733
−0.214042 −0.235141 −0.258899
0.222524 0.242179 0.264554
−0.100304 −0.091608 −0.085981
0.118284 0.108584 0.101365
30
7 8 9 10 11
−0.226688 −0.242599 −0.259471 −0.276934 −0.295302
0.233521 0.248992 0.265356 0.282548 0.300725
−0.092957 −0.089259 −0.087074 −0.085151 −0.841357
0.109089 0.105818 0.103535 0.102071 0.101314
−0.200036 −0.217704 −0.235661 −0.254437 −0.273700
0.204048 0.221309 0.238850 0.257257 0.276336
−0.066053 −0.064713 −0.062931 −0.062044 −0.062243
0.077716 0.076188 0.073734 0.072402 0.072977
Table 2 Root mean square error and bias values of the entropy estimators HV(m,n) and AHE(m,n) for the exponential distribution with H (f ) = 1 using SRS and RSS. n
m
HVSRS(m,n)
AHESRS(m,n)
HVRSS(m,n)
AHERSS(m,n)
Bias
RMSE
Bias
RMSE
Bias
RMSE
Bias
RMSE
10
2 3
−0.442683 −0.435444
0.571820 0.561640
−0.323532 −0.265713
0.483573 0.443276
−0.337494 −0.332760
0.404667 0.401125
−0.220406 −0.159787
0.315220 0.276197
20
4 5 6
−0.256116 −0.262412 −0.26565
0.352810 0.358638 0.360325
0.141143 0.118697 0.090043
0.279706 0.271887 0.263318
−0.210620 −0.214122 −0.218028
0.259248 0.265246 0.272315
−0.098056 −0.072456 −0.048075
0.179990 0.172661 0.168086
30
7 8 9 10 11
−0.191094 −0.195662 −0.196983 −0.197171 −0.198853
0.275374 0.280589 0.282040 0.283394 0.286241
−0.058550 −0.036080 −0.021144 −0.005890
0.205261 0.200329 0.202056 0.204787 0.207709
−0.161705 −0.164468 −0.165511 −0.167152 −0.173076
0.206226 0.212265 0.217222 0.220237 0.229318
−0.027194 −0.010631 −0.006685
0.130283 0.136358 0.138626 0.145306 0.154215
0.008492
0.024904 0.039837
Table 3 Root mean square error and bias values of the entropy estimators HV(m,n) and AHE(m,n) for the standard normal distribution and H (f ) = 1.419 using SRS and RSS. n
m
HVSRS(m,n)
AHESRS(m,n)
HVRSS(m,n)
AHERSS(m,n)
Bias
RMSE
Bias
RMSE
Bias
RMSE
10
2 3
−0.521455 −0.563002
0.591007 0.623188
−0.409842 −0.386562
0.496627 0.468471
−0.422169 −0.462240
0.471157 0.504378
−0.308706 −0.291133
0.375690 0.353844
20
4 5 6
−0.327070 −0.352658
0.372436 0.395796 0.416964
−0.214227 −0.205782 −0.203268
0.279269 0.272804 0.269194
−0.285331 −0.305555 −0.335066
0.318855 0.337744 0.365185
−0.168035 −0.160392 −0.162263
0.219922 0.213700 0.216405
30
7 8 9 10 11
−0.269724 −0.285713 −0.304064 −0.320051 −0.339131
0.305134 0.321039 0.337563 0.352764 0.369866
−0.132038 −0.129915 −0.131105 −0.130086 −0.127890
0.196792 0.193509 0.198239 0.196928 0.196985
−0.241325 −0.254983 −0.274697 −0.295057 −0.314201
0.268228 0.282376 0.301420 0.319933 0.339141
−0.105796 −0.102504 −0.103392 −0.101392 −0.102034
0.158654 0.157726 0.160749 0.160593 0.161378
0.3759960
√
Therefore, we used the heuristic formula m = n + 0.5 suggested by Grzegorzewski and Wieczorkowski [26] for selecting m and computing the RMSEs of the entropy estimators. The results given in Table 6 are computed based on the quantity RK =
HVK(m,n) − AHEK(m,n) HVK(m,n)
× 100,
K = SRS, RSS, DRSS
(17)
for the uniform, exponential and standard normal distributions which illustrate the performance of the RMSE of the suggested estimators using SRS, RSS, and DRSS. Based on Tables 1–6, we can conclude the following
• The suggested estimator AHE(m,n) has a smaller RMSE than HV(m,n) for all cases considered in this study. For example, for n = 20 and the window size m = 5 for the uniform distribution when H (f ) = 0, the RMSE of AHE(m,n) is 0.157805 with bias −0.133179 while the RMSE of HV(m,n) is 0.288985 with bias −0.276800 using the SRS method. • The estimator AHE(m,n) based on DRSS is more efficient than its counterpart using SRS. As an example, for n = 30 and m = 8 for the standard normal distribution with H (f ) = 1.419, the RMSE values using SRS and DRSS are 0.193509 and 0.145579, respectively.
A. Ibrahim Al-Omari / Journal of Computational and Applied Mathematics 261 (2014) 95–102
101
Table 4 Root mean square error and bias values of the entropy estimators HV(m,n) and AHE(m,n) for the uniform distribution with H (f ) = 0 and exponential distribution with H (f ) = 1 using DRSS. n
Uniform distribution and H (f ) = 0
m
HVDRSS(m,n)
Exponential distribution and H (f ) = 1
AHEDRSS(m,n)
HVDRSS(m,n)
AHEDRSS(m,n)
Bias
RMSE
Bias
RMSE
Bias
RMSE
10
2 3
−0.260621 −0.296104
0.278731 0.306116
−0.145388 −0.122180
0.176159 0.144286
−0.288898 −0.300393
0.340618 0.351750
−0.173991 −0.128545
0.251460 0.223802
20
4 5 6
−0.197693 −0.220876 −0.247733
0.204342 0.225845 0.251580
−0.082268 −0.077708 −0.075071
0.096978 0.091093 0.086966
−0.190904 −0.197900 −0.207032
0.229986 0.239789 0.251002
−0.075338 −0.052175 −0.026183
0.179771 0.145269 0.146832
30
7 8 9 10 11
−0.191854 −0.209886 −0.229010 −0.248006 −0.267506
0.194940 0.212509 0.231261 0.249993 0.269188
−0.058041 −0.056421 −0.056053 −0.056843 −0.056931
0.067650 0.065369 0.064628 0.064868 0.064430
−0.150245 −0.153441 −0.157250 −0.162854 −0.163540
0.188158 0.194332 0.199936 0.208891 0.213175
−0.046556 −0.001239
0.115023 0.120306 0.124585 0.133242 0.145582
0.012716 0.029477 0.045951
Table 5 Root mean square error and bias values of the entropy estimators HV(m,n) and AHE(m,n) for the standard normal distribution and H (f ) = 1.419 using DRSS. n
m
HVDRSS(m,n)
AHEDRSS(m,n)
Bias
RMSE
Bias
RMSE
10
2 3
−0.373395 −0.427401
0.412666 0.459119
−0.262149 −0.254450
0.316029 0.303820
20
4 5 6
−0.262789 −0.291340 −0.316105
0.290545 0.317967 0.341597
−0.148107 −0.145734 −0.147800
0.194728 0.191755 0.195946
30
7 8 9 10 11
−0.231613 −0.247340 −0.268298 −0.286538 −0.305310
0.255278 0.271084 0.291044 0.308661 0.326485
−0.095517 −0.094560 −0.091548 −0.094236 −0.093843
0.143483 0.145579 0.145394 0.149024 0.150300
Table 6 The RK values of AHE(m,n) with respect to HV(m,n) using SRS, RSS and DRSS. Uniform with H (f ) = 0
Exponential with H (f ) = 1
Standard normal with H (f ) = 1.419
SRS
RSS
DRSS
SRS
RSS
DRSS
SRS
RSS
DRSS
10
2 3
22.554260 34.126897
30.516686 45.818350
36.799638 52.865580
15.432654 21.074710
22.103853 31.144406
26.175364 36.374698
15.969354 24.826698
20.262250 29.845470
23.417728 33.825435
20
4 5 6
38.917933 45.393360 51.416572
46.844385 55.163743 61.684571
52.541328 59.665700 65.432069
20.720501 24.189015 26.922084
30.572271 34.905333 38.275159
21.833938 39.417988 41.501661
25.015573 31.074594 35.439510
31.027583 36.727225 40.740994
32.978368 39.693427 42.638255
30
7 8 9 10 11
53.285144 57.501446 60.982605 63.874811 66.310084
61.912883 65.573926 69.129579 71.856159 73.591208
65.297015 69.239420 72.054086 74.052074 76.065055
25.461009 28.604115 28.359098 27.737708 27.435622
36.825134 35.760488 36.182339 34.022894 32.750591
38.868929 38.092543 37.687560 36.214581 31.707752
35.506368 39.724146 41.273481 44.175710 46.741523
40.851067 44.143270 46.669431 49.804178 52.415662
43.793433 46.297458 50.043980 51.719200 53.964194
• The estimator AHE(m,n) based on DRSS is more efficient than the RSS estimator for the uniform and standard normal distributions. As an example, for n = 30 and m = 8 for the standard normal distribution and H (f ) = 1.419, the RMSE values using RSS and DRSS are 0.157726, and 0.145579, respectively.
• The performance of AHE(m,n) depends on both the underlying distribution and H (f ). For example, with n = 20 and m = 5, the RMSE values using RSS for the uniform, exponential and the standard normal distributions are 0.108584, 0.172661 and 0.213700, respectively. Among all cases considered in this study, we can conclude that suggested AHE(m,n) is working well with the uniform distribution when H (f ) = 0. • Within the three distributions considered in this study, uniform, exponential and standard normal distribution, it is found that the AHE(m,n) estimator performs better for the uniform distribution. The same thing can be stated for the HV(m,n) estimator. • The values of the RK based on SRS, RSS and DRSS showed that the AHE(m,n) estimators are more efficient than HV(m,n) , and the DRSS is more efficient than SRS. Also, the suggested DRSS is more efficient than the suggested RSS estimator for all cases except for the exponential distribution when (n = 20, m = 4) and (n = 30, m = 11).
102
A. Ibrahim Al-Omari / Journal of Computational and Applied Mathematics 261 (2014) 95–102
4. Conclusions In this paper, three new entropy estimators are suggested using SRS, RSS and DRSS. These estimators are compared with the estimator suggested by Vasicek [4]. It is found that the suggested estimators are more efficient than Vasicek’s estimator. Also, the suggested estimator using DRSS is superior to its counterparts using SRS and RSS. This motivates us to consider the estimators in other work using a multistage RSS method. Acknowledgments The author thanks the two anonymous referees for their helpful comments that substantially improved this paper. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26]
C.E. Shannon, A mathematical theory of communications, Bell System Technical Journal 27 (3, 4) (1948) 379–423, 623–656. N. Ebrahimi, K. Pflughoeft, E. Soofi, Two measures of sample entropy, Statistics & Probability Letters 20 (1994) 225–234. J.C. Correa, A new estimator of entropy, Communications in Statistics—Theory and Methods 24 (10) (1995) 2439–2449. O. Vasicek, A test for normality based on sample entropy, Journal of Royal Statistical Society, Series B 38 (1976) 54–59. H.A. Noughabi, R.A. Noughabi, On the entropy estimators, Journal of Statistical Computation and Simulation 83 (4) (2013) 784–792. B. Van Es, Estimating functionals related to a density by class of statistics based on spacings, Scandinavian Journal of Statistics 19 (1992) 61–72. H.A. Noughabi, N.R. Arghami, A new estimator of entropy, Journal of the Iranian Statistical Society 9 (1) (2010) 53–64. B. Choi, K. Kim, S.H. Song, Goodness of fit test for exponentiality based on Kullback–Leibler information, Communications in Statistics—Simulation and Computation 33 (2) (2004) 525–536. S. Park, D. Park, Correcting moments for goodness of fit tests based on two entropy estimates, Journal of Statistical Computation and Simulation 73 (9) (2003) 685–694. M.N. Goria, N.N. Leonenko, V.V. Mergel, P.L. Novi Inverardi, A new class of random vector entropy estimators and its applications in testing statistical hypotheses, Journal of Nonparametric Statistics 17 (3) (2005) 277–297. J. Theil, The entropy of maximum entropy distribution, Economics Letters 5 (2) (1980) 145–148. H.N. Alizadeh, A new estimator of entropy and its application in testing normality, Journal of Statistical Computation and Simulation 80 (10) (2010) 1151–1162. B. Choi, Improvement of goodness of fit test for normal distribution based on entropy and power comparison, Journal of Statistical Computation and Simulation 78 (9) (2008) 781–788. G.A. McIntyre, A method for unbiased selective sampling using ranked sets, Australian Journal of Agricultural Research 3 (1952) 385–390. K. Takahasi, K. Wakimoto, On the unbiased estimates of the population mean based on the sample stratified by means of ordering, Annals of the Institute of Statistical Mathematics 20 (1) (1968) 1–31. A.I. Al-Omari, A.A. Jemain, K. Ibrahim, New ratio estimators of the mean using simple random sampling and ranked set sampling methods, Revista Investigación Operacional 30 (2) (2009) 97–108. A. Haq, J. Shabbir, A family of ratio estimators for population mean in extreme ranked set sampling using two auxiliary variables, SORT: Statistics and Operations Research Transactions 34 (1) (2010) 45–64. A.I. Al-Omari, Ratio estimation of the population mean using auxiliary information in simple random sampling and median ranked set sampling, Statistics & Probability Letters 82 (11) (2012) 1883–1890. M.F. Al-Saleh, A.I. Al-Omari, Multistage ranked set sampling, Journal of Statistical Planning and Inference 102 (2) (2002) 273–286. A. Gaur, K.K. Mahajan, S. Arora, A nonparametric test for a multi-sample scale problem using ranked-set data, Statistical Methodology 10 (2013) 85–92. A.I. Al-Omari, Estimation of mean based on modified robust extreme ranked set sampling, Journal of Statistical Computation and Simulation 81 (8) (2011) 1055–1066. A.I. Al-Omari, A.D. Al-Nasser, Statistical quality control limits for the sample mean chart using robust extreme ranked set sampling, Economic Quality Control 26 (1) (2011) 73–89. M. Mahdizadeh, On the use of ranked set samples in entropy based test of fit for the Laplace distribution, Revista Colombiana de Estadística 35 (3) (2012) 443–455. M.F. Al-Saleh, M.A. Al-Kadiri, Double ranked set sampling, Statistics & Probability Letters 48 (2) (2000) 205–212. O. Ozturk, Statistical inference under a stochastic ordering constraint in ranked set sampling, Nonparametric Statistics 19 (3) (2007) 131–144. R. Wieczorkowski, P. Grzegorzewsky, Entropy estimators improvements and comparisons, Communications in Statistics—Simulation and Computation 28 (2) (1999) 541–567.