On the ML-estimator of the positive and negative two-parameter binomial distribution

On the ML-estimator of the positive and negative two-parameter binomial distribution

STATISTII~S & ELSEVIER Statistics & Probability Letters 33 (1997) 129-134 On the ML-estimator of the positive and negative two-parameter binomial di...

349KB Sizes 2 Downloads 63 Views

STATISTII~S & ELSEVIER

Statistics & Probability Letters 33 (1997) 129-134

On the ML-estimator of the positive and negative two-parameter binomial distribution1 Carlo Ferreri Dipartimento di Scienze Statistiche, Universitd degli Studi di Bologna, via Belle Arti 41, 40126 Bologna, Italy

Received May 1995; revised June 1996

Abstract Conditions of existence and uniqueness of the maximum likelihood estimator are discussed for a two-parameter hyperbinomial distribution linking the positive binomial distribution with the negative binomial distribution in terms of a real non-poissonian dispersion parameter. Keywords: Maximum likelihood estimator; Positive binomial distribution; Negative binomial distribution; Hyper-

binomial distribution

I. Introduction As the references in Clark and Perry (1989) and in Arag6n et al. (1992) show, the manifold uses of both the negative binomial distribution (NBD) and the positive binomial distribution (PBD) have required a number of papers devoted to the estimation of their two parameters. But, if the theorem of Arag6n et al. on the existence and uniqueness of the m a x i m u m likelihood estimator (MLE) of the two parameters of the N B D can be seen as conclusive, a similar result is not achieved again with regard to the PBD. The counterexample in DeRiggi (1994), which has given sufficient conditions for unimodality of the positive binomial likelihood function, is in fact, indicative of an open matter. Here, our intent is to solve the problem fully. We therefore propose to deal with the ML-estimation of the parameters kL and a of the probability distribution Px=

-

/a

(l+a/~)-l/~\l+a/~j,

with

=0,1,...,x,

(1.1)

where x = oc for a > 0 and x = [ - I / a ] for a < 0, [ - l / a ] indicating the integer part of - 1 / a . Since a is denoting a real number, (1.1) is named hyperbinomial distribution (HBD). It does include both the previous

1This work was supported by MURST-Bologna University, 60% grant for 1995. 0167-7152/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved PII S01 67-7 1 52(96)00 120-4

C. Ferreri /Statistics & Probability Letters 33 (1997) 129-134

130

models. In fact: (i) when ct > 0, (1.1) reduces to the N B D , thanks to the well known relation

(ii) for c~~ 0, it tends to the Poisson distribution (PD); (iii) when ~ < 0, (1.1) generally identifies, instead, with an incomplete P B D because it implies Z~=oP~ ~< 1, where the equal sign is valid only for ~c = - 1/e. In this particular situation, (1.1) reduces obviously to the usual BD(N, p) by setting N = ~c and p = #/~. Nevertheless, the following form of the PBD(~c, #) (see Binet, 1986, and references therein) P~ =

1-

x = 0, 1, ..., K,

(1.2)

which ensues from (1.1) when ~c = - 1/c~ is a natural number, is considered here for reasons of homogeneity. O f course, when ~ < - 1/ct, the distribution (1.1) can be completed by the term P~+a = 1 - ~x=o px. If the value of this term is negligible, as it is often in practice, then G(z) = [1 + ap - e # z ] - 1/, can be seen as a probability generating function (p.g.f.) of (1.1) also for e < 0 and, hence, the expressions E ( X ) = p,

Var(X)

:

0"2 =

#(1 + ~#)

can be assumed, in the same a p p r o x i m a t i o n order, for a r.v. X distributed according to (1.1).

2. ML-estimator Suppose D = {(x, nx); x = 0, 1 . . . . . c and n = y~=on~} represent the frequency distribution of the n units (workers, for example) of a r a n d o m sample with respect to the n u m b e r x of events of interest (accidents) which h a p p e n e d to each one in a time period, c = max(x) being the greatest value of x displayed. With count data in this form, if the model (1.1) is assumed, it is easy to verify (Ferreri, 1996) that the first derivatives of the log-likelihood function l(p, ~t) lead firstly to the estimator/~ = M, M denoting the sample mean L

c-1

h

x f ~ = ~ (1 -- Fh),

M = x=O

where F h = Z fx

h=O

(2.1)

x=O

and f~ indicates the p r o p o r t i o n nx/n of the sample units to which x events occurred. In terms of M, the following M L - e q u a t i o n of

(2.2)

~(@) = (~1(00 - - ~)2(@) = 0

ensues, where

¢~(~)

=

eLI 1 - - F h

c-~

h=O 1 + ~ h '

q~'~(c~) = -- h=O ~ (1 + cch)2 '

_

~b2(e) = - log(1 + c~M),

q~i(e) =

1--Fh

1 + aM'

c-~ (1 -- Fh)h ~b';(cO = 2 h~=or Tcch)3 ,

~b~(c~)=

(2.3)

,

(2.4)

with lim~_~ +~ 4h(e) = - o% l i m ~ + ~ q~2(e) = - oc and l i m ~ + ~ qSx(e)/4~/(e ) = oc. By taking ME = F~x = o x2 f x = 2y~h=oh(1 c- 1 - - Fh) + M and ~2 = M 2 _ M 2 into account, the expression q~(~) = e2 [(62 _ M ) / 2 + 0(~)3

(2.5)

C. Ferreri/Statistics & Probability Letters 33 (1997) 129 134

131

can m o r e o v e r be obtained (Ferreri, 1996) by the M a c L a u r i n expansions of 1/(1 + ~h) in ~b~(~) and log(1 + aM) in t~2(~), where lim,~o O(c~)/c~ = M3/3 _~h=ohC1 2(1--Fh). The function ~b(e), having a zero of order 2 at ct = 0, has therefore to be considered for >max(-1/(c-1),-1/M)

withc>l,

(2.6)

where a > - 1/(c - 1) is imposed by the ~b~(a)-discontinuities and by the restriction K/> c if - 1/c~ is a natural number, or by ~c + 1 ~> c if ~ denotes only the integer part of - 1/e. O f course, for c = 1, (2.2) would only have the root ao = 0. In the light of (2.3) and (2.4), ~b(a) can be regarded as a difference between two functions which are monotonically decreasing, concave u p w a r d and with a contact of order 2 at ~ = 0 where they have the c o m m o n tangent y = - Mc~. Then (2.5) shows that, in a n e i g h b o u r h o o d of a = 0, these functions are such that q~(~) > (~2(00 o r q~l(00 < (j~2(~) according to 62 > M or 62 < M as well as qS(e) and ~b"(c0 are of the same sign of 62 - M, whereas ~b'(e) is of the opposite sign on the left of c~ = 0 and of the same sign on the right. Therefore, when 62 > M, the curve ~b2(~), which is below ~ba(~) for ~ < 0 and in a n e i g h b o u r h o o d of ~ = 0, as e increases positively, remains below ~ba(c~) only until a positive value ~o of c~ at which it is intersected by q~x(e) implying l i m , . + ~ qS(c0 = - o~. Thus, just as in Arag6n et al. (1992), when 62 > M, Eq. (2.2) always has a unique positive root ~o. Instead, when 62 < M, the two situations M < c - 1; M > c - 1, equivalent to 52h=o > ~1 Fh 1;}~h=oFh<~~ 1, respectively, are to be distinguished. In fact, • c 1 (I) i f M < c - I, 1.e. ~h=o Fh > 1 as it is m o r e often in practice, the curve q~z(e) is a b o v e ~bl(~) for c~ > 0 and, as ~ decreases negatively, remains above until a value ~0 of the interval ( - 1 / ( c - 1), 0) at which q~2(e) is intersected once only by ~bl (e) (as claimed already in Ferreri, 1996) given that, for c~going to - 1 / ( c - 1) from ~o, the convex function ¢1(e) increases to + o0 from 4~a(C~o)> 0 with q~'l(C~o)< q ~ ( e o ) < 0, while the analogous function q~2(c0 increases monotonically only to - log(1 - M / ( c - 1)) < + oc from the same value. This can be elucidated also by extending the p r o o f which Arag6n et al. (1992) gave for 8 2 > M (see Appendix). Thus, when 8 2 < M with M < c - 1 , Eq. (2.2) always has a unique negative solution C~o > - 1/(c - 1). (II) i f M = c - 1, t h a t i s Z hc-1 = o F h = 1, then lim~l_ 1/~c-1)(al(cO/cP2(e) = + oc follows, hence Eq. (2.2) has only one negative root as in the case (I) and just as DeRiggi (1983) proved for the circumstance in which the sample mean M is an integer. (III) if M > c - 1, that"is 52~h= ox Fh < 1, (2.6) reduces to c~ > - 1 / M and the curve 4)2 (c0, which is a b o v e ~bl(~) for ~ > 0 and in a n e i g h b o u r h o o d of e = 0, as c~ decreases negatively, does not intersect the curve ~bl(c~) or intersects it twice. In fact, as e goes from 0 to - 1 / M , 4)2(~) can always remain above q51(e). But, if q52(~) intersects once q~1(e) at an ~o, where ~b'l(C~o)< qS~(~0) < 0, then it has to intersect ~bl(e) again at a value ~'o ~ ( - 1 / M , C~o)given that, for ~ going to - 1 / M from Co, the convex function ~b2(c0 tends to + oc, while ~b1(c0 increases monotonically only to q~l(-1/M) from ~bl (~o) = q52(c%) > 0. Also these cases can be elucidated by the aforesaid extension, to negative values of~, of the p r o o f of Arag6n et al. (1992) (see Appendix). Thus, when 82 < M with M > c - 1, Eq. (2.2) always has no solution or two negative roots in ( - l / M , 0), the greatest of which gives the ML-estimator. In such a case, if a value solving (2.2) is found, another root has, therefore, to be searched. It is worth noting that M > c - 1 generally involves 82 > M as it is in effect for ~h=oFh~1 ~< 0.5. We think these situations are so immediately recognizable that they firstly lead one to forget the DeRiggi (1994) conditions and, then, allow one to easily distinguish the case in which the M L - e q u a t i o n of c~ admits only one negative solution when 6 2 < M given that he also p r o p o s e d to find a case involving an equation, in - l / a , with " m o r e than one root on the [c, oc)". Obviously, his counterexample regards the frequency distribution specified by f0 =0.0305, fl --f2 . . . . . J~ 1 = 0 andf~ = 0.9695 with c = 20, from which 6 2 = 11.8279 and M = 19.3900 > c - 1 C--1 (52h=O Fh = 0.5785 < 1) ensue. In accordance with what the situation (III) can involve, (2.2) has therefore the

132

C. Ferreri / Statistics & Probability Letters 33 (1997) 129-134

two roots: ~ = - 0.049843, ~o = 02 = - 0.048689 such that intervals specified in DeRiggi (1994). The following frequency distributions Dj:

D1

D2

D3

D4

D5

D6

n0j:

66

56

25

1

24

24

nlj:

25

35

10

8

1

2

n2~:

8

7

5

25

1

1

nat:

1

2

60

66

74

73

- 1/~o and

- 1 / ~ fall within the disjoint

have been chosen not only to illustrate similar situations discussed but a b o v e all to elucidate their heuristic ground. In fact, the first three distributions represent the most c o m m o n cases leading to only one root ~o = 02 for (2.2): D1 shows an overdispersion which involves 02 = 0.1640, while D2 and D3 give 6 2 < M with M < c - 1 and M = c - 1, respectively, to which 02 = - 0.1366 and 02 = - 0.1773 correspond. Instead, the last three distributions yield examples of this situation (III) because they present an underdispersion with M > c - 1 which leads D4 and Ds to have no negative root and D 6 to imply, instead, Eq. (2.2) with the two negative roots: ~ = - 0.4229 > - 1 / M = - 0.4484 and ~o = & = - 0.3506. But, D4 and Ds do also elucidate that, in such a situation, no negative root can occur in very different cases (in Olkin et al. (1981) the M L - e q u a t i o n is considered for ~ ~> - 1/c instead of (2.6)); in fact, D4 draws attention since it does not involve an ML-estimate although it is "symmetric" to the Dl-distribution. Then the fact that D 6 follows from D5 by reducing it by one n3 in favour of nl outlines that, especially in situation (III), the M L - e s t i m a t o r of ~ has to be seen particularly unstable just as Olkin et al. (1981) noticed. O f course, one could correctly object that, when 62 < M , the function l(#, ~) used can be considered as a log-likelihood function only in the measure that (1.1) a p p r o x i m a t e s a complete distribution. It has to be noted, however, that the same thing always arises out when the log-likelihood function l(#, x) obtained from (1.2), or that implied by a BD(N, p), is regarded as continuous also with respect to the p a r a m e t e r x, or N, respectively. In practice, therefore, the first p r o b l e m that needs to be solved regards the choice between model (1.1), having • as a continuous parameter, and model (1.2). In fact, in the former case, one has to o v e r c o m e at least the M L - e s t i m a t i o n difficulties noted for situation (III); instead, in the latter, it is sufficient (Ferreri, 1996) to deal with the M L - e s t i m a t i o n in the discrete context just where the x - p a r a m e t e r of(1.2) has to be seen. O n the other hand, regarding x as continuous would be like considering (1.1) and then assuming a r o u n d value without justification a p a r t from mere simplicity. It is worth noting that, in such a way, the following ML-estimates 7, 6, 3, 3, 3 follow uniquely and easily for ~c from the distributions D2, ..., D6, respectively.

Appendix Aragbn et al. (1992) showed that the M L E of a, solving Eq. (2.2) is unique when 6 2 > M. They gave the p r o o f firstly for c = 2 and then dealt with the case of c >1 3 in the footsteps of the case c = 2. F o r this reason, here we limit ourselves to extending the p r o o f to the situation of 62 < M only for c = 2. The basic ideas consist of (i) rewritting the M L - e q u a t i o n in terms of the variables u = 1 - Fo and M = (1 - Fo) + (1 -- F1), where 1 ~> u > / M - u > 0, and (ii) considering, hence, the equation q~(~) = - ~u - ~(M - u)/(1 + a) + log(1 + aM) = 0, from which (c~ + 1)log(1 + aM) -- a M u = u(M; ~) =

°~2

(A.1)

C. Ferreri / Statistics & Probability Letters 33 (1997) 129-134

133

in the domain D = {(u, M): 1 ~> u ~> M - u > 0} follows as a curve of the points (M, u) depending on ~. As ~ 0, (A.1) reduces to the parabola u(M; O) = M - ½ M 2 dividing D into two parts of which E ~- = {(u, M): 0 <~ M <~ 1, ½M <~ u <~ u(M; 0)}

(A.2)

represents the subregion corresponding to 6 2 > M, that is, relative to non-negative values of ~. Then, after showing that the derivatives u~t(M; ct)

1-M 1 + aM'

uMu(M; ~) -

l+c~ (1 + ctM) 2

(A.3)

and u~t,(M; ~) = - (1 - M ) M / ( 1 + ~M) 2 do imply that the relation u(M; ~2) < u(M; ~1) for 0 < al < a2

with lim,_~ u(M; ct) = 0

(A.4)

is valid E +, they concluded: (i) the graphs of (A.1) are ordered in E + with respect to the positive values of a; (ii) E + is a disjoint union of these graphs; (iii) a given pair (M, u) has, hence, exactly one graph containing it and the corresponding zero ~0 = o2for ML-equation is unique. But, the reasoning made by Arag6n et al. ensures that the relation u(M; "2) < u(M; ~1), is valid also for

-1 <~1<~t2<0

and

0
withlim~_~u(M;~)=M,

u(0;~)=0,

where - 1 is the value of - 1/(c - 1). Thus, their conclusions are still extendible to the graphs of(A.1) in the following D-subregion. Ei- = {(M, u): 0 ~< M ~< 1, u(M; 0)~< u < M} relative to the negative values of corresponding to the case of 6 2 < M with M ~< c - 1, that is, to the abovesaid situations (I) and (II). The remaining part of the domain D, given by the subregion E2 ={(M,u): 1 < M < 2 , ½ M ~ < u < M } , is instead relative to the situation (III) specified by 62 < M with M > c - 1 = 1. For this situation, in which M < - 1/~ has to be for (A.1), the derivatives (A.3) allow one to point out that: (i) each concavely increasing graph of u(M; ~) in E~- has a maximum at M = 1 from which it decreases as M increases up to an M, such that u(M; ~) = ½M; (ii) the relation u~(M; ~ ) < uM(M; ~2) < 0 for e~ < e2 < 0, when M > 1, implies that the graph L2 by e2, which is below the graph L1 by el in a right neighbourhood o f M = 1, intersets L~ at a point P(M; ~ , ~z) as M increases to M, 2 at which L2 exits from D; (iii) the same relation and the ordering of the graphs indexed by ~ in E~- ensure moreover that the only two graphs L1 and Lz do intersect at P(M; ~ , ct2). Therefore, each point of E2 being a point of a graph (as, for example, for M = 1.1 and u = 0.70) is an intersection point with another graph only and these two graphs do correspond to the two roots which can arise in the situation (III). Then, the fact that a part of E2 is not covered by graphs accounts for the cases in which (as, for example, for M = 1.1 and u = 0.90) no root exists in the same situation. Of course, in our notations, the extension u -- u(M, v; ct) of (A.1) for c ~> 3 would follow from setting V = ( V 1 . . . . , vc- 2), where /)h = 1 -- Fh, h = 1 . . . . . c - 2, with 1 >~ u >1 va >>- "'" >~ Vc- 2 >t" M - u C--2 Y~= ~ Vh > 0. Then, given that a frequency distribution D has M = 0 as a mean if and only iffo = 1 and, hence, u = 0, v~ = 0 . . . . , vc-2 = 0, the relation u(0, v; ~) = u(0, 0; ct) = 0 can uniquely be considered (see Arag6n et al.) in order to extend the relation (A.4).

Acknowledgements The appendix was added since the referee and the Associate Editor required an algebraic proof about the number of the negative roots in the situations (I) and (III). The author is grateful to them for all helpful suggestions contributing to the clarity and formal accuracy of the paper.

134

C. Ferreri / Statistics & Probability Letters 33 (1997) 129-134

References Arag6n, J., D. Eberly and S. Eberly (1992), Existence and uniqueness of the maximum likelihood estimator for two-parameter negative binomial distribution, Statist. Probab. Lett. 15, 375-379. Clark, S.J. and J.N. Perry (1989), Estimation of the negative binomial parameter k by quasi-likelihood, Biometrics 45, 309 316. DeRiggi, D. (1983), Unimodality of likelihood functions for the positive binomial distribution, J. Amer. Statist. Assoc. 78, 181 183. DeRiggi, D. (1994), Sufficient conditions for unimodality of the positive binomial likelihood function, Statist. Probah. Lett. 19, 1-4. Ferreri, C. (1996), On a hyperbinomial process, Comm. Statist. Theory Methods 25 (1), 83-103. Olkin, I., A.J. Petkau and J.V. Zidek (1981), A comparison of n estimators for the binomial distribution, J. Amer. Statist. Assoc. 76, 637-642.