A note on sequential estimation of the size of a population under a general loss function

A note on sequential estimation of the size of a population under a general loss function

Statistics & Probability Letters 47 (2000) 159 – 164 A note on sequential estimation of the size of a population under a general loss function Z.D. B...

78KB Sizes 2 Downloads 27 Views

Statistics & Probability Letters 47 (2000) 159 – 164

A note on sequential estimation of the size of a population under a general loss function Z.D. Baia , Mosuk Chowb; ∗ a Department

b Department

of Statistics and Applied Probability, National University of Singapore, Singapore of Statistics, Penn State University, 326 Thomas Building, University Park, PA 16802, USA Received June 1998; received in revised form July 1999

Abstract In estimating the size of a nite population under a sequential sampling scheme where the stopping rule is to stop sampling when a xed number of marked items are observed, it has been shown that the maximum likelihood estimator (MLE) does not have an explicit expression and is inadmissible under weighted-squared-error loss. This note shows that the MLE is inadmissible under a very general class of loss functions. Also, a class of estimators which dominate the MLE is constructed and given in the article. Finally, an optimal class of estimators for some commonly used loss functions will c 2000 Elsevier Science B.V. All rights reserved be derived. Keywords: Admissibility; Capture–recapture; Maximum likelihood estimator; Sequential sampling; Loss function; Risk function

1. Introduction In the sequential estimation of the size of a nite population, items are drawn one at a time with replacement. Suppose that there are N (unknown) items of the population which are initially unmarked. Any unmarked items will be marked before being replaced whereas marked items will be returned unchanged. Several stopping rules have been proposed in the literature and various estimators of N formulated. Samuel (1968) considered ve stopping rules and presented the asymptotic distributions of the maximum likelihood estimators under the various stopping rules. As mentioned in Samuel (1968), the most often used stopping rule is to stop sampling when a xed number, say B, of marked items has been observed. This stopping rule will be denoted by SB and is assumed for the remaining part of the paper. Also, a natural and commonly used estimator for N is the maximum likelihood estimator (MLE). Here we want to investigate the optimality property of the MLE. For more references on the sequential estimation setup, see Goodman (1949), Chapman (1954), Darroch (1958), Darling and Robbins (1967), Leite et al. (1988). Seber (1985) provides a good bibliography of previous works. ∗

Corresponding author.

c 2000 Elsevier Science B.V. All rights reserved 0167-7152/00/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 9 9 ) 0 0 1 5 2 - 2

160

Z.D. Bai, M. Chow / Statistics & Probability Letters 47 (2000) 159 – 164

An estimator is admissible if and only if there does not exist another estimator that performs uniformly at least as well as the estimator in terms of risk and performs better than it in at least one case. As mentioned in Brown (1975), under most circumstances, if one looks at the statistical problem through the risk function, admissibility appears as a minimal requirement which an estimator must satisfy in order to be used – an experimenter should not use an inadmissible estimator. Admissibility of MLE for discrete problems had been studied by many previous papers, for example, see Hwang (1982), Chow (1990) and Brown et al. (1992). In Bai and Chow (1991), the admissibility of the MLE under squared error loss was investigated and the MLE was shown to be inadmissible. Some may attribute the result to the undesirable property of the squared error loss being both unbounded and convex where large errors might be penalized much too severely as mentioned in Berger (1985). It is thus of interest to consider the admissibility of the MLE under other types of loss functions, especially non-convex loss functions. In this note, we address the problem by taking a decision theoretical approach for a general loss function which may be nonconvex and bounded. In our setup, the loss function is only assumed to be symmetric and non-decreasing on the right half line. Thus, the squared error loss is a special case of our general loss function. The MLE will be shown to be inadmissible by means of a general result that we establish as Lemma 1. A remarkable result of our work is that the MLE is inadmissible even for bounded and nonconvex losses. Furthermore, an optimal class of estimators for some commonly used loss functions will be derived. Admissibility of uniformly minimum variance unbiased estimator for various estimation problems had been considered in the literature. Chow and Fong (1992) showed that the UMVUE in simultaneous estimation of the Hardy–Weinberg proportions were inadmissible under the sum of squared error loss. For the problem considered here, it is known that a UMVUE exists (see Seber (1985) for details). Lemma 1 may be applied to show that the UMVUE for this sampling scheme is inadmissible under the general loss function since the UMVUE has the same asymptotic distribution as the maximum likelihood estimator (see Goodman, 1953). In Section 2, Lemma 1 is stated and proved. Then, using the result of the lemma, the inadmissibility of the MLE is formally declared in Theorem 1. In Section 3, the method of computing the optimal choices for the improved estimator (asymptotically) is given.

2. Loss function and the inadmissibility of the MLE In Bai and Chow (1991), it has been shown that the MLE is inadmissible for the sequential estimation of the size of a nite population under the weighted-squared-error loss. Here, we consider a class of loss functions which includes the weighted-squared-error loss as a special case and show that the MLE is inadmissible under this class of loss functions. The class of functions only need to satisfy some reasonable conditions for loss functions and include most commonly used loss functions. Let  denote the unknown parameter and  be the decision rule. The loss functions considered are of the form L(; ) = ’( − );

(2.1)

where ’ is any nonnegative function satisfying the following two conditions: 1. Symmetric about zero, nondecreasing for x ¿ 0, ’(0) = 0 and ’(x0 ) ¿ 0 for some x0 . 2. For all x, ’(x)6M e c|x| , for some M ¿ 0 and c ∈ (0; 0:5): The rst condition is commonly assumed for loss functions. We can see that the second condition is to limit the growth of the loss functions so that the risk function is nite. Note that if ’(x)¿M e0:5|x| , for all large x and some M ¿ 0, then the risk function of the MLE for our sequential sampling problem can be shown to be in nite and consequently the MLE is inadmissible. Thus, the second condition does not pose a serious restriction for our problem.

Z.D. Bai, M. Chow / Statistics & Probability Letters 47 (2000) 159 – 164

161

The risk function of a decision rule  is R(; ) = E [L(; )]:

(2.2)

The sample space for our sequential sampling problem is the set of all nonnegative integers. This is assumed in the proof of the following lemma. Lemma 1. For the loss function considered in (2:1); suppose that the parameter space  is unbounded above and the estimator (k) ↑ ∞ as k → ∞ where k is a sucient statistic for  ∈ . If there exists a positive constant  ¡ 1 such that R(; )6R(; ) for all suciently large  and the strict inequality holds for in nitely many large ; then  is inadmissible. Proof. From the given assumptions, there exists a positive integer k1 such that (k1 ) ¿ 0 and R(; )6R(; ) for all  ¿ (k1 ). A new estimator 1 is constructed as follows: ( (k) if k ¡ k1 ; 1 (k) = (k) if k¿k1 which will be shown to dominate  for all . We rst show that ( ¿0 if k¿k1 and 6(k1 ); ’((k) − ) − ’((k) − ) 60 if k ¡ k1 and  ¿ (k1 ); where (k1 ) = [(1 + )=2](k1 ). If k¿k1 , 6(k1 ) and (k)¿, then ’((k) − ) − ’((k) − )¿0 which follows from the fact that (k)−¿(k)−¿0 and that ’ is increasing. Now, consider the case where k¿k1 , 6(k1 ) and (k) ¡ . Because (1 + )(k)¿(1 + )(k1 ) = 2(k1 )¿2; we obtain (k) − ¿ − (k) ¿ 0: This implies that ’((k) − ) − ’((k) − )¿0: Now, we prove the second assertion. Assume that k ¡ k1 and  ¿ (k1 ). If (k)6, then ’((k) − ) − ’((k) − )60 which follows from the fact that (k) − 6(k) − 60 and ’ is decreasing for negative values. If (k) ¿ , then ’((k)−)−’((k)−)60 which follows from the fact that 0 ¡ (k)−6−(k): The last inequality is obtained from (1 + )(k)6(1 + )(k1 ) = 2(k1 ) ¡ 2: Therefore, for 6(k1 ), E [’(1 (k) − )] = E [’((k) − )I (k ¡ k1 )] + E [’((k) − )I (k¿k1 )] 6 E [’((k) − )];

(2.3)

where I (E) is the indicator function of E which has the value 1 for each point of E and the value 0 for each point not in E. Finally, if ¿(k1 ), then E [’(1 (k) − )] = E [’((k) − )I (k ¡ k1 )] + E [’((k) − )I (k¿k1 )] 6 E [’((k) − )] 6 E [’((k) − )];

(2.4)

162

Z.D. Bai, M. Chow / Statistics & Probability Letters 47 (2000) 159 – 164

where the last inequality in (2.4) holds strictly for in nitely many large . The above argument completes the proof of Lemma 1. Now, we consider the sequential sampling scheme where the stopping rule is to sample until a xed number, B, of marked items has been observed. Let W denote the number of unmarked items sampled and T denote the total number of items sampled. It is clear that T = W + B. The distribution of T is (see Samuel, 1969) PN (T = t) =

(t−B) (N )t−B (t − B)St−1 ; Nt

t = B + 1; : : : ; B + N;

(2.5)

  Pm where (N )k = N (N − 1) · · · (N − k + 1) and Sn(m) = (1=m!) j=0 (−1)m−j mj j n is a Stirling number of the second kind. One may obtain the MLE of N , Nˆ , by maximizing, with respect to N , the likelihood function de ned above. In the following lemma, we provide a formula to calculate the asymptotic risk of Nˆ for all  such that 0 ¡  ¡ 1=2c. The proof is very similar to that of Lemma 2 in Bai and Chow (1991) and is omitted here. Lemma 2. For all x; if ’(x)6M ec|x| ; for some M ¿ 0 and c ∈ (0; 0:5); then ! Z ∞   x Nˆ − N 2 = − 1 2B ’ (x) d x; lim EN ’ N →∞ N 2B 0

(2.6)

2 where 2B is the probability density function of the chi-square variable with 2B degrees of freedom and  is a constant satisfying 0 ¡  ¡ 1=2c.

Let

Z g() = =



0

2B 



Z

 x 2B



−1

 2 − 1 2B (x) d x

2 ’(y)2B



2B(y + 1) 

 dy:

(2.7)

It follows from Lemma 2 that g() is the limit risk of the estimator Nˆ for all  such that 0 ¡  ¡ 1=2c. Lemma 3. For  such that 0 ¡  ¡ 1=2c; g() is di erentiable and g0 (1) ¿ 0. Proof. The di erentiability of g follows from the dominated convergence theorem. Also, since    Z ∞ @ 2B 2 2B(y + 1) 0  dy; ’(y) g () = @  2B  −1 it is easy to verify that        @ @ 2B 2 2B(y + 1) 2B(y + 1) 2  =−  : @  2B  2B @y 2B+2  Thus 0

g () = −

Z

∞ −1

 d2 ’(y) 2B 2B+2



2B(y + 1) 

(2.8)

(2.9)

 :

(2.10)

Z.D. Bai, M. Chow / Statistics & Probability Letters 47 (2000) 159 – 164

163

By the above formula and integration by parts, we obtain that    Z ∞ @ 2B 2 2B(y + 1) 0 g (1) =  ’(y) dy @  2B  −1 =1 Z =−

1 2 ’(y) d2B+2 (2B(y + 1)) 2B

−1

=

1 2B

=

1 2B +



Z



−1

Z

1

0

1 2B

2 2B+2 (2B(y + 1)) d’(y)

2 2 [2B+2 (2B(1 + y)) − 2B+2 (2B(1 − y))] d’(y)

Z 1



2 2B+2 (2B(1 + y)) d’(y):

(2.11)

The last term in the above expression is obviously nonnegative. In order to show that g0 (1) ¿ 0, we only need to show that 2 2 2B+2 (2B(1 + y)) − 2B+2 (2B(1 − y)) ¿ 0:

(2.12)

De ne h(y) =

2 2B+2 (2B(1 + y)) : 2 2B+2 (2B(1 − y))

(2.13)

It suces to show that h(y) ¿ 1

∀y ∈ (0; 1):

Note that h(0) = 1 and the above formula follows from the fact that ∀y ∈ (0; 1), B−1    1+y y2 h0 (y) = 2B e−2By ¿ 0: 1−y (1 − y)2

(2.14)

(2.15)

Theorem 1. The MLE Nˆ is inadmissible under the class of loss functions de ned in (2:1). Proof. Since g0 (1) ¿ 0, we may choose 0 ¡ 1 such that g(0 ) ¡ g(1). Note that g(1) and g(0 ) are the limiting risks of the MLE Nˆ and the estimator 0 Nˆ , respectively. We may thus nd an integer N0 such that for all N ¿N0 !# !# " " 0 Nˆ − N Nˆ − N ¡ EN ’ : (2.16) EN ’ N N By Lemma 1, we complete the proof of the inadmissibility of the MLE Nˆ under the loss function ’. 3. Optimal choices of  for di erent loss functions In the previous section, we have shown that the MLE is inadmissible under a wide range of loss functions. Also, the competing estimators Nˆ are di erent under di erent loss functions. In this section, we are going to illustrate the method to obtain the optimal choice of  under di erent loss functions. In particular, the optimal choice of  under the L1 norm will be given.

164

Z.D. Bai, M. Chow / Statistics & Probability Letters 47 (2000) 159 – 164

Remark 1. If ’ is di erentiable (except for nitely many points) and satis es (2.1), then the optimal choice of  should satisfy g0 () = 0: Or, more explicitly, Z ∞   x 2 − 1 2B ’0 (x) d x = 0: 2B 0

(3.1)

(3.2)

Remark 2. If ’(x) = x2 , then the solution to (3.1) is  = B=(B + 1). This case has been discussed in detail in Bai and Chow (1991). Remark 3. If ’(x) = |x|, then the solution to (3.1) is  = 2B=m(B + 1) where m(B) denotes the median of B−1 −x e =(B − 1)! is the density of Gamma distribution with parameter B. B , where B (x) = x Acknowledgements The work of the second author was partially supported by the NSF Grant DMS 97-09481. We want to thank Professor George A.F. Seber and Professor Duncan K.H. Fong for their helpful comments. References Bai, Z.D., Chow, M., 1991. Inadmissibility of the MLE in the sequential estimation of the size of a population. Biometrika 78, 817–823. Berger, J.O., 1985. Statistical Decision Theory and Bayesian Analysis, 2nd edition. Springer, New York. Brown, L.D., 1975. Estimation with incompletely speci ed loss functions (the case of several location parameters). J. Amer. Statist. Assoc. 70, 417– 427. Brown, L.D., Chow, M., Fong, D., 1992. On the admissibility of the maximum likelihood estimator of the Binomial variance. Canadian J. Statist. 20, 353–358. Chow, M., Fong, D., 1992. Simultaneous estimation of the Hardy–Weinberg proportions. Canadian J. Statist. 20, 291–296. Chapman, D.G., 1954. The estimation of biological population. Ann. Math. Statist. 25, 1–15. Chow, M., 1990. Admissibility of the MLE for simultaneous estimation in the negative binomial problems. J. Multivariate Anal. 33, 212–219. Darling, D.A., Robbins, H., 1967. Finding the size of a nite population. Ann. Math. Statist. 38, 1392–1398. Darroch, J.N., 1958. The multiple recapture census. I. Estimation of a closed population. Biometrika 45, 343–359. Goodman, L.A., 1949. On the estimation of the number of classes in a population. Ann. Math. Statist. 20, 572–579. Goodman, L.A., 1953. Sequential sampling tagging for population size problems. Ann. Math. statist. 24, 56–59. Hwang, J.T., 1982. Improving upon standard estimators in discrete exponential families with applications to Poisson and negative binomial cases. Ann. Statist. 10, 857–867. Leite, J.G., Oishi, J., Pereria, C.A., de, B., 1988. A note on the exact maximum likelihood estimation of the size of a nite and closed population. Biometrika 75, 178–180. Samuel, E., 1968. Sequential maximum likelihood estimation of the size of a population. Ann. Math. Statist. 39, 1057–1068. Samuel, E., 1969. Comparison of sequential rules for estimation of the size of a population. Biometrics 25, 517–527. Seber, G.A.F., 1985. The Estimation of Animal Abundance, 2nd edition. Grin, London.