Statistics & Probability Letters 48 (2000) 287 – 292
Estimation of the support of a discrete distribution Nabendu Pala , Wei-Hsiung Shenb , Bimal K. Sinhac; ∗ a Department
of Mathematics, University of Southwestern Louisiana, Lafayette, LA 70504, USA b Department of Statistics, Tunghai University, Taichung, Taiwan c Department of Mathematics and Statistics, University of Maryland, Baltimore County Campus, Baltimore, MD 21228, USA Received April 1999; received in revised form November 1999
Abstract Let Y be a positive integer-valued random variable with the probability mass function P (Y = y) = f(y; r)=a(Â); y = r; r + 1; : : : ; Â, where r is a known positive integer, and  ∈ = {r; r + 1; : : :} is an unknown parameter. We show that, for estimating Â; cY is inadmissible under both 0 –1 and a general loss whenever 0 ¡ c ¡ 1. Under some mild conditions on f(y; r), we prove that Y is admissible and minimax under both 0 –1 and squared error loss. As an application, we consider the problem of estimating the size  of a ÿnite population whose elements are labeled from 1 to Â, based on a simple random sample of size n under both with and without replacement. Admissibility and minimaxity of Y , the largest number observed in the sample, under 0 –1 and squared error loss hold under both sampling situations. We propose two integer-valued estimators of  of the form [cY ] for c ¿ 1 in the case of sampling with replacement and discuss their bias c 2000 Elsevier Science B.V. All rights reserved and mean-squared error ([cY ] denotes the integer nearest to cY ). MSC: 62G05 Keywords: Admissible; Hammersley–Chapman–Robbins inequality; Mean-squared error; Minimax; Population size; Squared error loss; 0 –1 loss
1. Introduction Let Y be a positive integer-valued random variable with the probability mass function PÂ (y) = PÂ (Y = y) =
f(y; r) ; a(Â)
y = r; r + 1; : : : ; Â;
(1.1)
P where a(Â) = [ y=r f(y; r)]−1 ; r is a known positive integer, and  ∈ = {r; r + 1; : : :} is an unknown parameter. We consider the problem of estimation of  based on Y , and show that quite generally, as expected, ∗
Corresponding author. Tel.: +1-410-455-2347; fax: +1-410-455-1066. E-mail address:
[email protected] (B.K. Sinha)
c 2000 Elsevier Science B.V. All rights reserved 0167-7152/00/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 0 ) 0 0 0 0 9 - 2
288
N. Pal et al. / Statistics & Probability Letters 48 (2000) 287 – 292
cY is inadmissible under both 0 –1 and a general loss whenever 0 ¡ c ¡ 1. Under some mild conditions on f(y; r) and hence a(Â), we prove that Y is admissible and minimax under both 0 –1 and squared error loss. Details are given in Section 2. As an application, we consider in Section 3 the problem of estimating the unknown size  of a ÿnite population whose elements are labeled from 1 to Â, based on a simple random sample of size n under both with and without replacement. Some variations of this problem have a long and rich history (Seber, 1982; Boswell et al., 1988; Hossain, 1995; Sengupta and De, 1997). Examples of a serially numbered population include the number of telephones installed by a company in a given year in a city, number of taxis in New York City, number of purchases made on credit cards issued by a bank over a certain period of time, etc. Admissibility and minimaxity of Y , the largest number observed in the sample, under 0 –1 as well as squared error loss hold under both sampling situations. We propose two additional meaningful integer-valued estimators of  of the form [cY ] for c ¿ 1 in the case of sampling with replacement and discuss their bias and mean-squared error. Here [cY ] denotes the integer nearest to cY .
2. Some general results We ÿrst prove the following result. Theorem 2.1. Let L(; Â) be a general loss function; nondecreasing in | − Â|. Under model (1:1); the estimator cY is inadmissible for estimating  under both 0 –1 loss and L(; Â) whenever 0 ¡ c ¡ 1. Proof. We will show that Y dominates cY under both the loss functions. Clearly, under 0 –1 loss, the risk of Y is given by R0 (Y |Â) = P (Y 6= Â) = 1 − P (Y = Â), and that of cY is given by R0 (cY |Â) = P (cY 6= Â) = 1 − P (cY = Â) = 1 − P (Y = Â=c) = 1, since Y 6 with probability 1 and 0 ¡ c ¡ 1. Thus, cY is dominated by Y . On the other hand, under the loss L(; Â), since cY ¡ Y 6Â, it is trivial that L(Y; Â)6L(cY; Â) for all Y and Â. This completes the proof of the theorem. We now show that, under some mild conditions on f(Â; r) and hence a(Â), Y is both admissible and minimax for estimating  under 0 –1 and squared error loss. We ÿrst consider 0 –1 loss. P∞ −1 ¡ ∞; (ii) Âa(Â) is nondecreasing in Â; Theorem 2.2. For model (1:1); assume P that (i) Â=r {Âa(Â)} ∞ f(Â; r)=Âa(Â) ¡ ∞. If (i) and (ii) hold; Y is admissible for  (iii) inf Â∈ [ f(Â; r)=a(Â)] = 0; and (iv) Â=r under 0 –1 loss. If (iii) and (iv) hold; Y is minimax for  under 0 –1 loss. Proof. P Consider a sequence of proper priors {m } given by m (Â) = cm  −1−1=m ;  = r; : : : , where ∞ cm = [ Â=r  −1−1=m ]−1 is the normalizing constant, and m¿1. Note that the proper sequence {m } approaches ∞ , a generalized (improper) prior as m →∞. Using (1:1), it readily follows that the posterior distribution of Â, given Y = y, is given by ∞ X m (Â|y) = Â=y
1 Â 1+1=m a(Â)
−1
1 Â 1+1=m a(Â)
;
 = y; : : :
(2.1)
which is proper by assumption (i). By assumption (ii), the posterior is monotonically decreasing and hence the mode occurs at  = y. Thus, Y is a proper Bayes estimate under 0 –1 loss, and hence admissible.
N. Pal et al. / Statistics & Probability Letters 48 (2000) 287 – 292
289
To prove the minimaxity of Y under 0 –1 loss, note that the risk of Y is R1 (Y |Â) = 1 − P (Y = Â) = 1 − 1, by assumption (iii). OnP the other hand, the Bayes f(Â; r)=a(Â) so that sup R1 (Y |Â) = 1 − inf  f(Â; r)=a(Â) P= ∞ ∞ (Y |m )=1− Â=r P (Y =Â)m (Â)=1−cm [ Â=r f(Â; r)= 1+1=m a(Â)]. risk of Y under the prior m is given by r1P ∞ 1+1=m a(Â)]=1, by assumption (iv) since limm→∞ cm = Thus, limm→∞ r1 (Y |m )=1−limm→∞ cm [ Â=r f(Â; r)= 0. This implies that, for a sequence of proper priors {m }; limm→∞ r1 (Y |m ) = 1 = sup R1 (Y |Â), thus proving the minimaxity of Y by a standard result (Lehmann, 1983, Theorem 2.2, p. 256). We now turn our attention to the squared error loss, and prove the following result. Theorem 2.3. For model (1:1); Y is admissible for  under squared error loss. Furthermore; Y is minimax for  whenever limÂ→∞ a(Â)=a( − 1) = 1. Proof. To prove the admissibility of Y , note that if Y is not so, there exists h(Y ) such that E (h(Y ) − Â)2 6E (Y − Â)2
for all  ∈
(2.2)
with strict inequality for some Â. Taking successively  = r; r + 1; : : : , and recalling that Y assumes values from r to Â, it follows easily that h(r) = r, h(r + 1) = r + 1, and so on. For example, when  = r, since Y = r with probability one, (2.2) readily yields h(r) = r. For  = r + 1, since Y assumes the values r and r + 1, (2.2) together with the fact that h(r) = r immediately gives h(r + 1) = r + 1. Thus, h(y) = y for all y (easily seen by induction). To prove the minimaxity of Y , we use Hammersley–Chapman–Robbins (HCR) inequality (Lehmann, 1983, inequality (6), p. 116). We show that for any estimate h(Y ) of Â, sup E(h(Y ) − Â)2 = ∞:
(2.3)
Â
Towards this end, note that the support S(Â) of the distribution of Y under  given in (1.1) contains the support S( − 1) of the same under  − 1. Therefore, by HCR inequality, we get E (h(Y ) − Â)2 ¿b2 (Â) +
[b(Â) − b(Â − 1) + 1]2 ; EÂ [PÂ−1 (Y )=PÂ (Y ) − 1]2
(2.4)
where b(Â) is the bias of h(Y ). Using (1.1), (2.4) simpliÿes to EÂ (h(Y ) − Â)2 ¿b2 (Â) +
[b(Â) − b(Â − 1) + 1]2 : [a(Â)=a(Â − 1) − 1]2
(2.5)
If b(Â) has a ÿnite limit as Â→∞, then E (h(Y )−Â)2 approaches ∞ by our assumption about a(Â). Otherwise, it is obvious that the latter holds, thus proving the minimaxity of Y . Remark 2.1. Under the squared error loss, taking the prior as (Â) ˙ a(Â)e−c for c ¿ 0, we get the admissibility of Y + 1=(ec − 1), i.e., linear estimates of the form P∞Y + d for d ¿ 0, whose minimaxity is also obvious from Theorem 2.3. Of course, it is assumed here that Â=r a(Â)e−c ¡ ∞. Remark 2.2. It is tempting to conclude the admissibility of estimators of the form cY for c ¿ 1 under the squared error loss, whose minimaxity is obvious from Theorem 2.3. However, this need not be true. Clearly, the risk of cY under the squared error loss has a minimum at copt = ÂE(Y )=E(Y 2 ). Using the obvious bounds
290
N. Pal et al. / Statistics & Probability Letters 48 (2000) 287 – 292
[E(Y )]2 6E(Y 2 )6ÂE(Y ), we get 16copt 6
 : E(Y )
(2.6)
It therefore follows that if an upper bound, say , is known for Â=E(Y ), then cY is inadmissible for  under the squared error loss whenever either c ¡ 1 or c ¿ . Remark 3.1 in the next section provides an expression for in a special case. Of course, the admissibility of cY for 1 ¡ c6 does not follow from the previous argument and needs to be settled separately. 3. An application In this section we apply the preceding results to the problem of estimation of the size  of a ÿnite population whose elements are labeled from 1 to  on the basis of a simple random sample of size n. Obviously, we can take Y as the largest number observed in the sample, which is sucient for Â. 3.1. Sampling with replacement When sampling is done with replacement, it is clear that Y has the distribution as in (1.1) with r = 1, f(y; r) = yn − (y − 1)n and a(Â) =  n . The ÿrst two assumptions in Theorem 2.2 are trivially satisÿed. For (iii) and (iv), it is enough to note that f(Â; r)=a(Â) = 1 − (1 − 1=Â)n = C1n (1=Â) − C2n (1= 2 ) + · · · . Moreover, the assumption in Theorem 2.3 is obvious. Hence Y is admissible and minimax for  under 0 –1 loss as well as squared error loss. In fact, by Remark 2.1, all estimators of the form Y + d for d ¿ 0 are admissible and minimax under squared error loss. We conclude this subsection by proposing two other natural estimators of  of the form [cY ] for c ¿ 1. It is well known (Feller, 1968, p. 226) that E{Y } =  −
n  X y−1 y=1
Â
:
(3.1)
We ÿrst propose a ‘moment’-type estimate of  by equating Y to its expectation. Using the fact that  X y=1
(y − 1)n ∼
1 1 (Â − 1)n+1 ∼ Â n+1 ; n+1 n+1
we can suggest n+1 ˆ Y : Â1 = n We next propose a Bayes estimate of Â. Under a uniform prior, we get R ∞ −n+1  d n − 1 y y; n ¿ 2 = E{Â|Y = y} ∼ R ∞ −n n−2  d y resulting in (for n ¿ 2) n−1 ˆ Y : Â2 = n−2
(3.2)
(3.3)
(3.4)
N. Pal et al. / Statistics & Probability Letters 48 (2000) 287 – 292
291
Table 1 Bias and root-mean-squared errors of ˆ1 and ˆ2 n
Â
10 10 10 10 20 20 20 20
Bias
50 100 200 500 50 100 200 500
Rmse
ˆ1
ˆ2
ˆ1
0.64 0.62 0.61 0.61 0.42 0.62 0.61 0.58
1.69 2.87 5.18 11.98 1.01 1.04 1.63 3.22
ˆ2
4.63 9.18 18.29 45.66 2.60 4.81 9.60 23.87
4.89 9.76 19.39 48.19 2.66 4.99 9.73 24.21
Table 2 Bias and root-mean-squared errors of Y; ˆ1 and ˆ2 n
Â
Bias Y
20 40
30 60
−0.98 −1.02
Rmse ˆ1
ˆ2
Y
ˆ1
ˆ2
0.51 0.47
0.96 0.92
1.66 1.73
1.81 1.85
1.79 1.83
Incidentally, we may recall (Rohatgi, 1976, p. 357) that the uniformly minimum variance unbiased estimator (UMVUE) of  in this case is given by ˆumvue = [Y n+1 − (Y − 1)n+1 ]=[Y n − (Y − 1)n ]. Exact bias and root-mean-squared errors (rmse) of ˆ1 and ˆ2 are reported in Table 1 for n = 10; 20 and  = 50; 100; 200; 500. The numerical computations were written in SAS running on PC 586. The performance of both ˆ1 and ˆ2 is quite impressive, especially in terms of bias which remains at a remarkable level of below 1 for ˆ1 and at a fairly low level for ˆ2 even when the true value of  is quite large. We also ÿnd that ˆ1 has a slight edge over ˆ2 in terms of both bias and rmse. Our computations for Y (not shown here) reveal that the amount of (negative) bias is quite large compared to the above two estimators, thus making its rmse typically much larger than those of ˆ1 and ˆ2 . Nevertheless, we have demonstrated in Table 2 two cases where Y outperforms both ˆ1 and ˆ2 in terms of rmse. Based on the above ÿndings, our recommendation is to use ˆ1 . 3.2. Sampling without replacement When sampling is done without replacement, it is clear that Y again has the distribution as in (1.1) with y−1 and a(Â) = Cn . Assumptions (i) and (ii) in Theorem 2.2 are trivially satisÿed. For r = n, f(y; r) = Cn−1 (iii) and (iv), it is enough to note that f(Â; r)=a(Â) = n=Â. Finally, for the assumption in Theorem 2.3, note that a(Â)=a( − 1) = Â=( − n);  ¿ n. Hence Y is admissible and minimax for  under both 0 –1 and squared error loss. In fact, as in the previous case, all estimators of the form Y + d for d ¿ 0 are admissible and minimax under squared error loss (by Remark 2.1). We should note that the UMVUE of  in this case is given by n+1 Y − 1: ˆumvue = n
(3.5)
Remark 3.1. Following Remark 2.2 and using (3.1) along with its subsequent approximation, it turns out that in case of sampling with replacement, is indeed bounded above by (1 + 1=n). Moreover, using (3.5),
292
N. Pal et al. / Statistics & Probability Letters 48 (2000) 287 – 292
the same value of holds in case of sampling without replacement. We thus conclude that in both the cases cY is inadmissible for  under the squared error loss whenever c exceeds (1 + 1=n). Interestingly and rather surprisingly, this leads to the inadmissibility of the approximate Bayes estimate (n − 1)=(n − 2)Y (see (3.3))! We conclude by noting that the admissibility status of the estimators of the form [cY ] for c ¿ 1 is not obvious and requires further investigation. Acknowledgements Our sincere thanks are due to an anonymous referee for some excellent constructive comments which led to an improved version. We also thank Professor J.S. Huang and Professor Y.H. Wang for some helpful discussions. References Boswell, M.T., Burnham, K.P., Patil, G.P., 1988. Role and use of composite sampling and capture–recapture sampling in ecological studies. In: Krishnaiah, P.R., Rao, C.R. (Eds.), Handbook of Statistics, Vol. 6. North-Holland, Amsterdam, pp. 469–488. Feller, W., 1968. An Introduction to Probability Theory and Its Applications, Vol. 1, 2nd Edition. Wiley, New York. Hossain, M.F., 1995. Unknown population size estimation: an urn model approach. J. Statist. Studies 15, 89–94. Lehmann, E.L., 1983. Theory of Point Estimation. Wiley, New York. Rohatgi, V.K., 1976. An Introduction to Probability Theory and Mathematical Statistics. Wiley, New York. Seber, G.A.F., 1982. The Estimation of Animal Abundance and Related Parameters, 2nd Edition. McMillan, New York. Sengupta, S., De, M., 1997. On the estimation of a ÿnite population. Sankhya B 59, 66–75.