Journal of the Korean Statistical Society (
)
–
Contents lists available at ScienceDirect
Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss
Parametric inference based on judgment post stratified samples Omer Ozturk a , K.S. Sultan b, *, M.E. Moshref c a b c
The Ohio State University, Department of Statistics, 1958 Neil Avenue, Columbus, OH 43210, United States King Saud University, Department of Statistics and OR, Riyadh 11451, Saudi Arabia Department of Mathematics, Faculty of Science, Al-Azhar University, Nasr City, Cairo 11884, Egypt
article
info
Article history: Received 9 June 2016 Accepted 24 August 2017 Available online xxxx AMS 2000 subject classifications: 62D05 94A20 62G05
a b s t r a c t In this paper, we consider a judgment post stratified (JPS) sample of set size H from a location and scale family of distributions. In a JPS sample, ranks of measured units are random variables. By conditioning on these ranks, we derive the maximum likelihood (MLEs) and best linear unbiased estimators (BLUEs) of the location and scale parameters. Since ranks are random variables, by considering the conditional distributions of ranks given the measured observations we construct Rao-Blackwellized version of MLEs and BLUEs. We show that Rao-Blackwellized estimators always have smaller mean squared errors than MLEs and BLUEs in a JPS sample. In addition, the paper provides empirical evidence for the efficiency of the proposed estimators through a series of Monte Carlo simulations. © 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
1. Introduction In settings where abundance of auxiliary information is available, judgment post-stratified (JPS) sample provides improved efficiency over its competitor simple random sampling designs. A JPS sample starts with a simple random sample (SRS) and uses available auxiliary information from additional sample units through a ranking process to determine the relative positions of the measured units in a set of size H. To construct a JPS sample from a population F , the experimenter first selects a simple random sample of size n and measures all of them. For each measured unit Xi ; i = 1, . . . , n, additional H − 1 units are selected to form a set of size H. Units in this set are ranked from smallest to largest without measurement and the rank (Ri ) of the measured unit Xi is recorded. We assume that the rank of Xi is determined without an error in its set. The JPS sample then contains n fully measured units X = (X1 , . . . , Xn ) and n ranks R = (R1 , . . . , Rn ) associated with these fully measured units as additional information. This ranking information provides a mechanism to create a post stratified sample by putting homogeneous observations together in the same strata. A JPS sample in this setting can be considered as a post stratified sample since stratification is performed after a simple random sample has been collected. Increased efficiency of a JPS sample is then anticipated from the general theory of stratified sample in survey sampling designs. Main difference between the JPS and the post-stratified sample is that post stratified sample requires a well-defined stratification variable. The measured units are then post stratified based on this stratification variable. The JPS sample, on the other hand, does not require a standard stratification variable. Post stratification can be done using visual inspection of units, judgment ranking, etc. It requires only monotonic relationship between the ranking information and variable of interest.
*
Corresponding author. E-mail address:
[email protected] (K.S. Sultan).
http://dx.doi.org/10.1016/j.jkss.2017.08.002 1226-3192/© 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.
2
O. Ozturk et al. / Journal of the Korean Statistical Society (
)
–
There is a close connection between a JPS sample and a ranked set sample (RSS). A JPS sample conditionally on rank vector R becomes an unbalanced RSS sample. To construct an unbalanced RSS, one specifies a set size H and a vector of judgment class sample sizes, n = (n1 , . . . , nH ), where nh , h = 1, 2, . . . , H , indicates the number of units with rank h to be selected for measurement. One then draws N independent simple random samples (sets) of size H and ranks the units within each set from smallest to using some method other than actual measurement. The hth judgment ranked unit is measured in ∑H ∑largest H nh sets so that h=1 nh = N. The total sample then consists of N = h=1 nh independent order statistics or judgment order statistics selected from H different judgment classes. Descriptions of RSS and JPS designs reveal that there are two major differences between two designs. The first difference is in the nature of the ranking process. In an RSS sample, the ranking information is used prior to measurement to determine which ranked unit should be selected for full measurement in each set. The judgment rank and measured observation in a set are strongly attached to each other, and this strong attachment cannot be broken. Hence, RSS must be analyzed under this strong data structure. The procedures developed for simple random samples cannot be used. This could be a problem for settings where data is collected for multi purposes and inferential procedure for RSS has not been developed yet. To address this deficiency, MacEachern, Stasny, and Wolfe (2004) have proposed the JPS sample that uses the same ranking information that is used in RSS, but it is used post-experimentally after a simple random sample is collected. As a result, researchers who use JPS retain the option of using SRS-based inferential procedures. The second difference is due to the distributional properties of the sample size vectors in JPS and RSS sampling designs. The sample size vector of judgment classes in an RSS sample, n = (n1 , . . . , nH ), is a deterministic vector and must be determined prior to the construction of the sample while it is a random vector in a JPS sample. A JPS sample may be thought of as a randomized version of unbalanced RSS. It is equivalent to drawing N independent simple random samples of size H, ranking the units in each sample from smallest to largest, and then deciding at random which one unit in each sample to measure. Since the sample size vector N ⊤ = (N1 , . . . , NH ) is a random vector, it may have severe unbalance among its entries, including zeros, where Nh is the number of measured units having rank h in the sample. Zero entries in N do not only reduce the efficiency, it may also lead to invalid inference, including biased estimators, inflated Type I error rates in tests and deflated coverage probabilities in confidence intervals. Many authors have investigated the ranked set and JPS samples to draw statistical inference. Du and MacEachern (2008) have provided several estimators for the contrast parameters in a design of experiment in a JPS setting. Balakrishnan and Li (2008), Barnett and Moore (1997), Ozturk (2011) and Stokes (1995) have developed parametric inference for the parameters of location-scale family of distributions based on an RSS sample. Frey and Ozturk (2011) have developed a constrained estimation by using judgment post-stratification. Frey and Feeman (2012) have suggested an improved mean estimator for judgment post-stratification, Frey and Feeman (2013) have considered the problem of estimating the variance of a population using judgment post-stratification. Ozturk (2012) has combined the ranking information in judgment post stratified and ranked set sampling designs. Ozturk (2014) has investigated the statistical inference for population quantiles and variance in judgment post-stratified samples. Wang, Stokes, Lim, and Chen (2006) have developed a class of estimators for population mean based on concomitant of multivariate order statistics. Wang, Lim, and Stokes (2008) have used stochastic ordering constraint to construct estimators for the population mean. Chen, Ahn, Wang, and Lim (2014), Ozturk (2013) and Stokes, Wang, and Chen (2007) combined ranking information from different sources to develop statistical inference for population characteristics under a JPS sample. Up to date references for RSS and JPS sample can be found in Hollander, Wolfe, and Chicken (2014, Chapter 15) and Wolfe (2012). x−µ In this paper, we propose parametric inference for a location and scale family of distributions F , F (x, θ ) = F ( σ ), ⊤ where θ = (µ, θ ). Section 2 develops maximum likelihood estimators (MLE) for location (µ) and scale (σ ) parameters of distribution F . Section 3 constructs best linear unbiased estimator (BLUE) for θ by conditioning on the ranks of the measured observations. It is shown that the conditional variance of the BLUE estimators depend on the sample size vector N . We construct unbiased estimators for the unconditional variance of BLUE estimators. Section 4 constructs Rao-Blackwellized estimators to improve the MLE and BLUE estimator. It is shown that Rao-Blackwellized estimators always perform better than regular MLE and BLUE estimators. Section 5 provides empirical evidence for the proposed estimators for small sample sizes. Finally, Section 6 provides some concluding remarks. 2. Maximum likelihood estimator Let (Xj , Rj ), j = 1, . . . , n be a JPS sample from a distribution F in a location and scale family, with location parameter µ and scale parameter σ . We construct maximum likelihood estimator of θ = (µ, σ ) based on the JPS sample. The conditional distribution of Xi given Ri = h is the same as the hth order statistics in a set of size H [see Arnold, Balakrishnan, and Nagaraja (1992)] fXi |Ri (x; θ|h) = f(h) (x; θ ) = H
(
H −1 h−1
)
F h−1 (x; θ ){1 − F (x; θ )}H −h f (x, θ ).
Since Ri has discrete uniform distribution on integers (1, . . . , H), the joint distribution of (Xi , Ri ) is given by fXi ,Ri (x, h; θ ) =
1 H
fXi |Ri (x; θ|h).
Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.
O. Ozturk et al. / Journal of the Korean Statistical Society (
)
–
3
The likelihood function of the model under a JPS sample is then constructed by H n { ∏ ∏ 1
L(θ|x, R) =
h=1 i=1
H
fXi ,Ri (xi , h; θ )
}I(Ri =h)
, X −µ
where I(Ri = h) = 1 if Ri = h and zero otherwise, x = (x1 , . . . , xn ). Let Zi = i σ and f ∗ (z) and F ∗ (z) be the probability density function (pdf) and cumulative distribution function (cdf) of Zi , respectively. The likelihood function then can be written as
}I(Ri =h) ∏ }I(Ri =h) H n { H n { ∏ ∏ ∏ 1 ∗ 1 ∗ fZi ,Ri (zi ) = f(h) (zi ) , Hσ Hσ
L(θ|z ; R) =
h=1 i=1
h=1 i=1
where z = (z1 , . . . , zn ). Maximum likelihood estimator maximizes log {L(θ|z ; R)} or equivalently solves the score equations. The score function can be obtained by taking the partial derivative of log-likelihood function with respect to µ and σ H n 1 ∑∑
⎡
f ∗ ′(h) (zi )
⎤
− I(Ri = h) ∗ ⎥ ⎢ σ f(h) (zi ) ⎥ ⎢ h=1 i=1 ⎥, ⎢ } { Sn (z , R : θ ) = log {L(θ|z ; R)} = ⎢ H n ⎥ ∗′ ∑ ∑ dθ f (h) (zi ) ⎦ ⎣ 1 − zi I(Ri = h) 1 + ∗ σ f(h) (zi ) d
h=1 i=1
∗ (y) with respect to y. Let θˆ MLE = θˆ MLE (R , x) be a quantity that solves Sn (z , R : θˆ MLE ) = 0. where f ∗ ′(h) (y) is the derivative of f(h)
Then θˆ MLE is the MLE of θ . For large sample sizes n, the variance of θˆ MLE is equal to the inverse of Fisher information. Under usual regularity conditions, Fisher information is given by I (θ ) = −
[ H n ∑ ∑
E I(Ri = h)
h=1 i=1
=
{ H n ∑ ∑
d2 dθ 2
[
E I(Ri = h)E −
h=1 i=1 H
=
{
log f(h) (zi ) d2
dθ
{
] }
,
] } } ⏐ ⏐Ri = h ,
log f(h) (zi ) 2
H
∑
E(Nh )I(h) (θ ) =
h=1
n ∑ H
I(h) (θ ),
h=1
where I(h) (θ ) is the Fisher information for the hth order statistic. The last equality in the above equation follows from the fact √ that Nh has a binomial distribution with success probability 1/H. Under usual regularity conditions, n(θˆ MLE − θ ) converges to a normal distribution with mean zero and variance covariance matrices nI −1 (θ ). Variance estimate of MLE can be obtained from the observed Fisher information. Let Ii (θˆ MLE ) = I(Ri = h)
d2 dθ
2
{
}⏐
log f(h) (zi ) ⏐θˆ
MLE
,
we then estimate I (θ MLE ) from I¯ (θˆ MLE ) =
n 1∑
n
Ii (θˆ MLE ).
i=1
For large n, I¯ (θˆ MLE ) converges to the Fisher information matrix I −1 (θ ). 3. Best linear unbiased estimator In this section, we consider the best linear unbiased estimator (BLUE) of θ by conditioning on rank vector R. It is easy to observe that E(Zi |Ri ) =
E(Xi |Ri ) − µ
σ
= E(Z(h) ) = αRi , and Var(Zi |Ri ) = Var(Z(h) ) = νRi ,
where αh and νh are the mean and variance of the hth standardized order statistics. By using the standardized mean and variance, we can write E(Xi |Ri ) = σ αRi + µ and Var(Xi |Ri ) = σ 2 νRi . Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.
4
O. Ozturk et al. / Journal of the Korean Statistical Society (
)
–
Let ⊤
[
B (R) =
1,
αR1 ,
2,
··· , ··· ,
αR2 ,
n
]
αRn
and Σ(R) = diag {νR1 , . . . , νRn }.
The BLUE of θ conditionally on rank vector R is then given by
{ }−1 θˆ BLUE = θˆ BLUE (R , X ) = B⊤ (R)Σ−1 (R)B(R) B⊤ (R)Σ−1 (R)X . The estimator θˆ BLUE can be derived using the results in Lloyd (1952) and Stokes (1995). It is easy to observe that the conditional expectation of θˆ BLUE (R , X ) given R is equal to θ ⊤ = (µ, σ ). BLUE estimator for a given rank vector R and data vector X simplifies to a form that depends only the judgment class sample size vector N ⊤ = (N1 , . . . , Nh )
⎡ H ∑ Nh ⎢ ⎢ ⎢ h=1 νh θˆ BLUE (N , X ) = ⎢ H ⎢∑ N α h h ⎣ νh h=1 −1
⎤−1 ⎡ ⎤ ∑n H ∑ j=1 I(Rj = h)Xj ⎥ ⎢ ⎥ ⎥ ⎢ νh ⎥ νh ⎥ ⎢ h=1 ⎥ h=1 ⎥ ⎥, ⎢ ∑ n H H ⎥ ⎢ 2⎥ ∑ ∑ I(R = h)X α Nh αh ⎦ ⎣ j j⎦ h j=1 H ∑ Nh αh
h=1
νh
νh
h=1
= A (N )T (R , X ). The variance of θˆ BLUE (N , X ) can be computed from the conditional variance formula
{
}
{
Var(θˆ BLUE (N , X )) = EN Var(θˆ BLUE (N , X ))|R + VarN E θˆ BLUE (N , X )|R
}
= σ 2 EN A−1 (N ), where EN indicates expectation with respect to multinomial random variable N . This expectation does not have an analytic expression. On the other hand, since under parametric model αh and νh , h = 1, . . . , H, are known, we can estimate Var(θˆ BLUE (N , X )) with small scale simulation study. Let M be a fixed integer, we then generate Nm , m = 1, . . . , M, multinomial random vectors from a multinomial distribution with parameters n and (1/H , . . . , 1/H) and compute
ˆ −1 = A
1 ∑ M
A−1 (Nm ).
(1)
m=1
Variance estimate of the BLUE estimator is then obtained from
ˆ θˆ BLUE (N , X )) = σˆ 2 Aˆ −1 , Var(
(2)
where σˆ 2 is the sample variance of X . Empirical properties of these variance estimates are investigated in Section 5. 4. Rao-Blackwellized estimators of MLE and BLUE In this section, we improve the MLE and BLUE estimators. JPS sample uses the ranks of measured X variable in a set of size H. These ranks are computed based on a particular construction of n sets of size H. By conditioning on the given values of X we compute the conditional expectation of the MLE and BLUE estimator over all possible values of rank vector R
θˆ RMLE = ER {θˆ MLE (R , X |X )}
(3)
θˆ RBLUE = ER {θˆ BLUE (R , X |X )}
(4)
where ER is the expectation over the distribution of ranks R for given value of sample X . The estimators θˆ RMLE and θˆ RBLUE will be called Rao-Blackwellized MLE and BLUE estimators, respectively. Due to Rao-Blackwell theorem, these estimators have smaller variances (or mean square errors) than MLE and BLUE estimators. Since the BLUE is unbiased, RBLUE is also unbiased. On the other hand RMLE could be biased for small sample sizes, but it is asymptotically unbiased. In practice, computation of expected values in Eqs. (3) and (4) could be time consuming. An approximation can be obtained for these estimators. Let Xi , i = 1, . . . , n, be the X -variable in a JPS sample. Let Y ⊤ = (Y1 , . . . , Yn(H −1) ) be the unmeasured random variables on additional units to form n sets, each of size H. We use the following algorithm to approximate the Rao-Blackwellized estimators for m = 1, . . . , M . I. Perform a random permutation on the entries of vector Y to obtain Ym = permute(Y ) II. Divide the entries of Ym into n sets, each of size H − 1 Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.
O. Ozturk et al. / Journal of the Korean Statistical Society (
)
–
5
Table 1 Biases of MLEs and variance estimates, and relative efficiencies of BLUEs and MLEs for normal distribution with mean 10 and standard derivation 5. M
5 5 5 5 5 5 5 5 5 5 5 5 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20
n
10 10 10 10 15 15 15 15 30 30 30 30 10 10 10 10 15 15 15 15 30 30 30 30 10 10 10 10 15 15 15 15 30 30 30 30
H
2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5
Biases of MLE
Relative efficiencies to the RMLE
Bias Var. Est
µ ˆ MLE
σˆ MLE
RE1
RE2
RE3
RE4
RE5
RE6
B(V (µ ˆ ))
B(V (σˆ ))
0.005 0.003 0.003 −0.023 0.028 −0.003 0.006 0.013 −0.004 −0.002 −0.005 0.008 −0.004 0.020 0.004 0.012 −0.028 0.011 −0.014 0.004 −0.012 −0.010 −0.009 −0.000 0.007 −0.009 −0.007 −0.015 0.009 −0.006 −0.013 0.018 0.001 0.003 0.001 0.001
−0.327 −0.302 −0.231 −0.268 −0.216 −0.196 −0.183 −0.163 −0.114 −0.091 −0.084 −0.071 −0.316 −0.290 −0.255 −0.231 −0.179 −0.181 −0.172 −0.155 −0.110 −0.100 −0.084 −0.077 −0.364 −0.288 −0.267 −0.265 −0.213 −0.192 −0.183 −0.157 −0.112 −0.090 −0.073 −0.078
1.286 1.378 1.362 1.432 1.231 1.297 1.367 1.410 1.211 1.289 1.310 1.372 1.321 1.341 1.478 1.467 1.234 1.349 1.408 1.453 1.232 1.317 1.410 1.469 1.322 1.442 1.476 1.521 1.248 1.401 1.421 1.536 1.204 1.342 1.403 1.540
5.460 3.758 3.123 2.748 5.578 3.535 2.928 2.687 5.099 3.408 3.015 2.725 5.605 3.804 3.316 2.848 5.666 3.805 3.119 2.725 5.352 3.562 3.000 2.673 5.565 3.953 3.232 2.831 5.330 3.715 2.985 2.989 5.404 3.591 3.047 2.851
1.069 1.057 1.035 1.022 1.044 1.037 1.032 1.040 1.033 1.026 1.023 1.011 1.082 1.027 1.031 1.007 1.040 1.030 1.028 1.009 1.026 1.030 1.016 1.017 1.068 1.037 1.016 1.012 1.036 1.029 1.020 1.022 1.028 1.026 1.012 1.024
3.865 2.574 2.205 1.821 3.904 2.532 2.112 1.873 3.740 2.552 2.202 1.878 3.836 2.564 2.110 1.804 3.921 2.512 2.099 1.798 3.685 2.549 2.073 1.820 3.741 2.460 2.062 1.778 3.552 2.455 1.997 1.827 3.513 2.511 2.046 1.854
1.134 1.218 1.250 1.312 1.138 1.197 1.275 1.326 1.159 1.244 1.270 1.329 1.167 1.237 1.330 1.359 1.139 1.247 1.327 1.377 1.187 1.271 1.366 1.406 1.166 1.259 1.352 1.398 1.151 1.300 1.343 1.447 1.161 1.289 1.364 1.476
1.074 1.117 1.198 1.241 1.073 1.153 1.221 1.274 1.098 1.149 1.256 1.289 1.082 1.169 1.202 1.282 1.091 1.147 1.275 1.277 1.102 1.163 1.254 1.361 1.098 1.173 1.240 1.289 1.085 1.170 1.237 1.374 1.098 1.189 1.280 1.402
−0.001 −0.018
−0.056
0.118
0.034
0.109
−0.008 −0.016 −0.010
−0.035 −0.097
0.005 −0.023 −0.030 −0.010 0.000 0.006 0.016 0.067 −0.004 0.026 0.006 0.025 −0.010 −0.014 −0.017 0.010 −0.005 −0.004 −0.051 −0.033 0.005 0.001 −0.012 −0.012 0.017 0.009 0.017 −0.006 0.005 −0.008
0.025 0.025 0.014 0.061 0.021 −0.010 −0.010 −0.081 0.075 −0.092 0.052 0.012 −0.053 0.027 −0.006 0.025 0.034 −0.006 0.018 −0.031 −0.155 −0.031 0.013 0.063 0.016 −0.018 −0.044 −0.007 0.018 0.008 0.006
III. Match these n sets of size H − 1 with n X measurements to form n sets, each of size H. Obtain the ranks of X -measurements in these sets, Rm . IV. Compute θ MLE (Rm , X |X ) and θ BLUE (Rm , X |X ) We then approximate the estimators from M 1 ∑ θˆ RMLE ≈ θ MLE (Rm , X |X ),
M
(5)
m=1
and M 1 ∑ θ BLUE (Rm , X |X ). θˆ RBLUE ≈
M
(6)
m=1
Selection of M depends on the cost of ranking procedure. If ranking procedure is either time consuming or costly, M may not be selected as large. On the other hand, even small values of M, such as M = 5 could provide a significant improvement. 5. Finite sample simulation and comparison In this section, we investigate the performance of the proposed estimators through a series of Monte Carlo simulations. Data sets in the simulation study are generated from normal and logistic distributions with mean 10, scale 5, and from exponential distribution with scale 5. Sample and set size are taken to be n = 10, 15, 30 and H = 2, 3, 4, 5, respectively. For each simulation parameter combination, data sets are generated from the parent distributions based on 5000 repetitions. Rao-Blackwellized estimators are computed from Eqs. (5) and (6) with M = 5, 10, 20. By construction the BLUEs are unbiased. On the other hand, MLEs may be biased for small sample sizes. Therefore, our simulation study considered the biases of MLEs and the variance estimates in Eq. (2), and efficiencies of all estimators. Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.
6
O. Ozturk et al. / Journal of the Korean Statistical Society (
)
–
Table 2 Biases of MLEs and variance estimates, and relative efficiencies of BLUEs and MLEs for logistic distribution with mean 10 and standard derivation 5. M
5 5 5 5 5 5 5 5 5 5 5 5 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20
n
10 10 10 10 15 15 15 15 30 30 30 30 10 10 10 10 15 15 15 15 30 30 30 30 10 10 10 10 15 15 15 15 30 30 30 30
H
Biases of MLE
2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5.00
Relative efficiencies to the RMLE
Bias Var. Est
µ ˆ MLE
σˆ MLE
RE1
RE2
RE3
RE4
RE5
RE6
B(V (µ ˆ ))
B(V (σˆ ))
−0.076 −0.002 −0.006
−0.258 −0.230 −0.228 −0.211 −0.190 −0.172 −0.139 −0.131 −0.117 −0.080 −0.065 −0.064 −0.258 −0.225 −0.211 −0.219 −0.165 −0.167 −0.132 −0.149 −0.090 −0.078 −0.072 −0.054 −0.258 −0.232 −0.197 −0.217 −0.160 −0.130 −0.139 −0.131 −0.089 −0.086 −0.082 −0.069
1.414 1.471 1.549 1.539 1.381 1.459 1.537 1.563 1.324 1.492 1.532 1.503 1.530 1.588 1.630 1.637 1.447 1.507 1.580 1.622 1.374 1.517 1.535 1.578 1.548 1.552 1.561 1.630 1.411 1.552 1.606 1.653 1.353 1.550 1.536 1.663
4.377 3.341 2.969 2.672 4.284 3.096 2.814 2.614 4.039 3.041 2.651 2.648 4.705 3.496 2.906 2.732 4.393 3.388 2.816 2.658 4.317 3.176 2.846 2.648 4.724 3.364 3.077 2.875 4.391 3.289 2.942 2.721 4.126 3.224 2.763 2.796
1.218 1.152 1.117 1.076 1.189 1.135 1.120 1.096 1.174 1.174 1.122 1.100 1.224 1.186 1.120 1.094 1.212 1.143 1.111 1.082 1.181 1.136 1.104 1.089 1.228 1.132 1.099 1.073 1.185 1.137 1.099 1.074 1.172 1.121 1.091 1.088
3.295 2.522 2.139 1.828 3.221 2.349 2.058 1.866 3.094 2.325 2.018 1.878 3.245 2.442 2.022 1.807 3.203 2.355 1.986 1.770 3.066 2.404 2.028 1.823 3.220 2.362 2.055 1.790 3.108 2.345 2.006 1.805 2.994 2.295 2.002 1.804
1.141 1.209 1.270 1.338 1.146 1.232 1.314 1.373 1.132 1.254 1.324 1.340 1.171 1.273 1.339 1.373 1.179 1.278 1.343 1.388 1.163 1.326 1.353 1.395 1.165 1.256 1.327 1.428 1.176 1.317 1.393 1.433 1.159 1.341 1.366 1.466
1.094 1.146 1.220 1.285 1.087 1.173 1.263 1.307 1.088 1.166 1.257 1.319 1.089 1.181 1.242 1.334 1.087 1.191 1.270 1.362 1.084 1.215 1.284 1.362 1.083 1.178 1.283 1.416 1.110 1.185 1.256 1.420 1.116 1.194 1.330 1.425
0.085 0.051 −0.122 −0.037 0.060 −0.064 −0.024 0.047 −0.019 −0.049 −0.023 −0.008 −0.340 0.005 0.021 0.016 0.106 0.082 0.008 −0.048 0.001 −0.011 0.023 0.015 −0.448 −0.065 0.216 0.093 0.086 −0.015 −0.004 −0.009 0.044 −0.069 0.015 −0.030
0.285 0.187 0.006 −0.060 0.070 0.013 0.010 0.017 0.045 −0.061 0.036 −0.020 −0.107 −0.065 0.031 −0.032 −0.100 0.002 0.062 0.049 −0.106 −0.002 −0.002 −0.007 −0.497 0.060 −0.044 −0.043 −0.020 0.088 0.018 0.011 0.058 −0.017 −0.001 −0.020
0.031 0.038 −0.018 0.018 0.011 0.011 0.010 −0.003 −0.007 0.005 0.025 −0.045 0.023 −0.000 −0.030 −0.004 0.015 −0.041 0.009 −0.006 −0.006 −0.027 0.001 −0.044 −0.021 0.005 0.040 −0.005 0.030 −0.021 0.005 −0.011 0.005
Relative efficiency of location and scale estimators are computed with respect to MLE of Rao-Blackwellized estimators RE1 = RE4 =
MSE(µ ˆ BLUE ) MSE(µ ˆ RMLE ) MSE(σˆ RBLUE ) MSE(σˆ RMLE )
,
RE2 =
,
RE5 =
MSE(σˆ BLUE ) MSE(σˆ RMLE ) MSE(µ ˆ MLE )
,
RE3 =
,
RE6 =
MSE(µ ˆ RMLE )
MSE(µ ˆ RBLUE ) MSE(µ ˆ RMLE ) MSE(σˆ MLE ) MSE(σˆ RMLE )
,
.
The values of REi > 1, i = 1, . . . , 6, indicate that Rao-Blackwellized MLE is more efficient. Let
ˆ µˆ BLUE ) = σˆ 2 Aˆ −1 (1, 1) and Var( ˆ σˆ BLUE ) = σˆ 2 Aˆ −1 (2, 2) Var( ˆ −1 (i, i) is the ith diagonal element of Aˆ −1 . The be the variance estimators of location and scale estimator in Eq. (2), where A −1 ˆ matrix A in each of 5000 replications of simulation study is computed from the average of 2000 simulated multinomial random vector with parameters n and (1/H , . . . , 1/H) in Eq. (1). The biases of the variance estimates are then computed from B(V (µ ˆ )) =
1 5000
5000 ∑
2 ˆ −1 σˆ BLUE ,j Aj (1, 1) −
j=1
1 4999
5000 ∑
(µ ˆ BLUE ,j − µ ¯ BLUE )2 ,
j=1
and B(V (σˆ )) =
1 5000
5000 ∑ j=1
2 ˆ −1 σˆ BLUE ,j Aj (2, 2) −
1 4999
5000 ∑
(σˆ BLUE ,j − σ¯ BLUE )2 ,
j=1
where µ ˆ BLUE ,j , σˆ BLUE ,j are the BLUE estimators of µ and σ at the jth iteration, µ ¯ BLUE , σ¯ BLUE are the average of 5000 BLUE ˆ −1 (i, i) is the value of Aˆ −1 (i, i) at the jth iteration of the simulation. The values of B(V (µ estimates, and A ˆ )) and B(V (σˆ )) j Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.
O. Ozturk et al. / Journal of the Korean Statistical Society (
)
–
7
Table 3 Biases of MLEs and variance estimates, and relative efficiencies of BLUEs and MLEs for exponential distribution with scale parameter 5. M
5.00 5.00 5.00 5.00 5.00 5.00 5.00 5.00 5.00 5.00 5.00 5.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 20.00 20.00 20.00 20.00 20.00 20.00 20.00 20.00 20.00 20.00 20.00 20.00
n
10.00 10.00 10.00 10.00 15.00 15.00 15.00 15.00 30.00 30.00 30.00 30.00 10.00 10.00 10.00 10.00 15.00 15.00 15.00 15.00 30.00 30.00 30.00 30.00 10.00 10.00 10.00 10.00 15.00 15.00 15.00 15.00 30.00 30.00 30.00 30.00
H
2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00 2.00 3.00 4.00 5.00
Bias
Relative efficiencies
σˆ MLE
RE2
RE4
RE6
Bias B(V (σˆ ))
0.007 0.021 0.033 0.018 −0.005 0.013 0.010 0.030 0.013 0.020 0.004 0.003 0.031 0.017 0.013 0.003 0.022 0.002 0.019 0.029 0.000 0.007 0.002 0.011 0.039 0.007 0.026 0.005 −0.017 0.006 0.018 0.004 −0.006 −0.007 0.014 −0.004
1.144 1.201 1.289 1.348 1.130 1.198 1.304 1.309 1.164 1.238 1.262 1.364 1.163 1.254 1.304 1.374 1.151 1.269 1.306 1.370 1.142 1.260 1.311 1.422 1.160 1.253 1.319 1.406 1.162 1.259 1.364 1.393 1.181 1.259 1.379 1.461
0.998 0.993 0.996 0.990 0.999 0.998 0.998 0.995 1.002 0.998 1.001 1.001 1.007 0.990 0.993 0.994 0.997 0.998 0.995 1.000 0.999 1.005 1.004 1.002 1.005 0.993 0.993 0.993 0.996 0.997 0.996 1.000 1.000 0.999 1.001 0.997
1.143 1.204 1.291 1.353 1.129 1.196 1.301 1.304 1.161 1.237 1.255 1.357 1.154 1.261 1.307 1.376 1.156 1.268 1.306 1.370 1.143 1.250 1.300 1.415 1.151 1.256 1.317 1.406 1.167 1.258 1.360 1.383 1.182 1.256 1.367 1.451
0.114 0.074 0.064 0.022 0.100 0.036 0.015 0.016 0.013 −0.002 0.010 0.005 0.165 0.105 0.082 0.018 0.043 0.021 0.032 0.047 0.018 0.010 0.005 0.008 0.131 0.067 0.053 0.046 0.072 0.049 0.004 0.021 0.001 0.020 0.009 0.009
are given in the last two columns of Tables 1, 2 for normal and logistic distributions, and B(V (σˆ )) is given in Table 3 for exponential distribution. Table 1 presents the biases and relative efficiencies of the proposed estimators for normal distribution. It is clear that both MLE and RMLE of scale parameter have substantial amount of negative biases for a small sample size, but the biases shrink toward zero as the sample size gets large as expected. MLE and RMLE of location parameter appear to have negligible biases for all sample sizes. We note that the MLE is asymptotically the best estimator. Hence, all relative efficiencies, as expected, are greater than 1 indicating that the Rao-Blackwellized MLE (RMLE) is the best estimator. From RE3 , we observe that Rao-Blackwellized location BLUE estimator has comparable efficiencies with Rao-Blackwellized location MLE, which indicates that µ ˆ RBLUE is almost as efficient as µ ˆ RMLE . Rao-Blackwellized estimators are computed with M = 5, 10, 20. In practice, we may not be able to select very large value for M due to the ranking cost. Simulation study suggests that even a small value of M = 5 provides a significant improvement. For example RE5 and RE6 for M = 5, n = 30, and H = 3 are 1.244 and 1.149, respectively. The same efficiency values for M = 10 and M = 20 are (1.271 1.163) and (1.289, 1.189), respectively. As we see, when we go from M = 5 to M = 20 there is still improvement in efficiency, but it is not as significant as going from M = 1 (RE5 = 1, RE6 = 1) to M = 5 (RE5 = 1.244, RE6 = 1.163). The last two columns in Table 1 indicate that the variance estimate of the BLUE estimators appear to be essentially unbiased. Table 2 presents biases and efficiency results for logistic distribution. Again MLE and RMLE of location parameter appear to have insignificant biases, while the biases of the MLE and RMLE of scale parameters are substantial for small sample sizes. These biases decrease with sample sizes. The relative efficiencies of the estimators follow the similar pattern that we have observed in Table 1. From the last two columns in Table 2, we may conclude that the variance estimates of the BLUE estimators are unbiased Table 3 presents the biases and efficiencies of the proposed estimators of the scale parameter of exponential distribution. For the exponential distribution, MLE and RMLE appear to have no bias for the scale parameter. Again RMLE is the most Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.
8
O. Ozturk et al. / Journal of the Korean Statistical Society (
)
–
efficient estimator as expected. It is clear from the column RE4 that Rao-Blackwellized BLUE estimator has almost identical efficiency as the efficiency of Rao-Blackwellized MLE of the scale parameter of σ . Based on this limited simulation study we may claim that σˆ RBLUE is as efficient as σˆ RMLE . In Tables 1–3, the biases of the MLE and RMLE are essentially the same. For this reason, we only reported the biases of MLE estimators. The last column in Table 3 also indicates that the variance estimate of the BLUE of σ is unbiased. 6. Concluding remarks In this paper, we developed inferences based on a judgment post stratified sample (JPS) from a location and scale family of distributions. A JPS sample stratifies the sample units post experimentally after the data has been collected based on their ranks (relative positions) in a set of size H. By conditioning on these ranks, we have constructed maximum likelihood (MLEs) and best linear unbiased estimators (BLUEs) of location and scale parameters. In a JPS sample, ranks and the variable of interest have a strong dependence structure, using this data structure we constructed Rao-Blackwellized MLEs and BLUEs of location and scale parameters. Rao-Blackwellized estimators substantially improve the efficiency of the regular MLE and BLUE estimators. Acknowledgments The authors would like to thank the referees for their helpful comments, which improved the presentation of the paper. Also, the authors would like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University for its funding this Research Group NO (RG-1435-056). A part of this work is completed during the first author’s visit to King Saud University. References Arnold, B. C, Balakrishnan, N., & Nagaraja, H. N. (1992). A first course in order statistics. New York: John Wiley and Sons. Balakrishnan, N., & Li, T. (2008). Ordered ranked set samples and application to inference. Journal of Statistical Planning and Inference, 138, 3512–3524. Barnett, V., & Moore, K. (1997). Best linear unbiased estimates in ranked-set sampling with particular reference to imperfect ordering. Journal of Applied Statistics, 24, 697–710. Chen, M., Ahn, S., Wang, X., & Lim, J. (2014). Generalized isotonized mean estimators for judgment post-stratification with multiple rankers. Journal of Agricultural, Biological, and Environmental Statistics, 19, 405–418. Du, J., & MacEachern, S. N. (2008). Judgement post-stratification for designed experiments. Bioemtrics, 64, 345–354. Frey, J., & Feeman, T. G. (2012). An improved mean estimator for judgment post-stratification. Computational Statistics & Data Analysis, 56, 418–426. Frey, J., & Feeman, T. G. (2013). Variance estimation using judgment post-stratification. Annals of the Institute of Statistical Mathematics, 65, 551–569. Frey, J., & Ozturk, O. (2011). Constrained estimation using judgment post-stratification. Annals of the Institute of Statistical Mathematics, 63, 769–789. Hollander, M., Wolfe, D. A., & Chicken, E. (2014). Nonparametric statistical method. New Jersey: Wiley. Lloyd, E. H. (1952). Least squares estimation of location and scale parameters using order statistics. Biometrika, 39, 88–95. MacEachern, S. N., Stasny, E. A., & Wolfe, D. A. (2004). Judgment post-stratification with imprecise rankings. Bioemtrics, 60, 207–215. Ozturk, O. (2011). Parametric Estimation of Location and Scale Parameters in Ranked Set Sampling. Journal of Statistical Planning and Inference, 1414, 1616–1622. Ozturk, O. (2012). Combining ranking information in judgment post stratified and ranked set sampling designs. Environmental and Ecological Statistics, 19, 73–93. Ozturk, O. (2013). Combining multi-observer information in partially rank-ordered judgment post-stratified and ranked set samples. The Canadian Journal of Statistics/La Revue Canadienne de Statistique, 41(2), 304–324. Ozturk, O. (2014). Statistical inference for population quantiles and variance in judgment post-stratified samples. Computational Statistics & Data Analysis, 77, 188–205. Stokes, S. L. (1995). Parametric ranked set sampling. Annals of the Institute of Statistical Mathematics, 47, 465–482. Stokes, S. L., Wang, X., & Chen, M. (2007). Judgment post stratification with multiple rankers. Journal of Statistical Theory and Applications, 6, 344–359. Wang, X., Lim, J., & Stokes, S. L. (2008). A nonparametric mean estimator for judgment post-stratified data. Bioemtrics, 64, 355–363. Wang, X., Stokes, L., Lim, J., & Chen, M. (2006). Concomitant of multivariate order statistics with application to judgment poststratification. Journal of the American Statistical Association, 101(476), 1693–1704. Wolfe, D. A. (2012). Ranked Set Sampling: Its Relevance and Impact on Statistical Inference. ISRN Probability and Statistics. http://dx.doi.org/10.5402/2012/ 568385.
Please cite this article in press as: Ozturk, O., et al., Parametric inference based on judgment post stratified samples. Journal of the Korean Statistical Society (2017), http://dx.doi.org/10.1016/j.jkss.2017.08.002.