2;
,
#'"
STATIS'nCs & PROBABILITY LETllSRS
l
ELSEVIER
Statistics & Probability Letters 39 (1998) 333-336
A Poisson limit law for a generalized birthday problem Norbert Henze Institut fiir Mathematische Stochastik, Universitiit Karlsruhe, Postfach 6980 Englerstr. 2, 76128 Karlsruhe, Germany Received March 1998
Abstract Balls are placed sequentially at random into n cells. Write T,(~) for the number of balls needed until for the mth time a ball is placed into a cell already containing e - 1 balls, where m/> 1 and c >_-2 are fixed integers. For fixed t >0, let X,,c denote the number of cells containing at least c balls after the placement of k, = [n l-lIe, t] balls, It is shown that, as n--+ 2 , the limit distribution of X,,c is Poisson with parameter tC/c! As a consequence, the limit law of nl-C(T(7))C/c! is a Gamma distribution. © 1998 Elsevier Science B.V. All rights reserved
A M S classification." Primary: 60F05; secondary: 60C05 Keywords: Sequential occupancy problem; General birthday problem; Coincidences; Inclusion-exclusion principle
1. Introduction The classical birthday problem (see e.g. Feller, 1968, p. 33) asks for the probability that out o f k people at least two share the same birthday. More generally, we may ask for the probability that among a group o f k people, at least c individuals have the same birthday, where c 1>2 is a fixed integer. Whereas the literature related to the classical birthday problem (c = 2) is large (see, e.g., Joag-Dev and Proschan, 1992; Mase, 1992; Nunnikhoven 1992), there are only few papers dealing with the case c > 2 (see, e.g., McKinney, 1966; Hoist, 1976; Hoist, 1995; Diaconis and Mosteller, 1989; Klamkin and Newman, 1967; Dwass, 1969). Cast in form o f a general sequential occupancy problem, suppose that balls are placed sequentially and independently o f each other into n cells, labeled from 1 to n. Each ball is equally likely to fall into any particular cell with probability 1/n. The process terminates if for the first time a ball is placed into a cell already containing c - 1 balls. Let Tn,c denote the waiting time until such a c-foM collision occurs. Klamkin and Newman (1967) showed that
E( Tn, c ) = fO ~
tJ
e -t dt
0167-7152/98/$19.00 @ 1998 Elsevier Science B.V. All rights reserved PII S0167-7152(98 )00076-5
N. Henze / Statistics & Probability Letters 39 (1998) 333-336
334
and derived the asymptotic formula E(Tn, c ) ~ . . F ( l + ~ ) . n
l-1/c
as n ---+cx~.
Dwass (1969) used the method of moments to show that
(:)
lim P(n I/C-I • Tn.c<~t)= 1 - exp - ~ .
,
t>0.
(1)
That is, the limit distribution of n 1/~-1. Tn,~, as n--, oc, is a Weibull distribution. By imbedding the drawings in Poisson processes, Hoist (1976) rederived (1) and, among other things, proved that {nl/C-1. Tn,~:c = 2, 3 . . . . . m} are asymptotically independent for each fixed m >/3. In this paper, we present a Poisson limit theorem for the number of cells containing at least c balls, after a suitable number of balls has been placed. This result adds to the large number of limit theorems for occupancy problems (see, e.g., Johnson and Kotz, 1977, Ch. 6), and it yields (1) as a corollary. The method of proof is elementary and thus interesting also from a pedagogical point of view.
2. Main result
For convenience, we first state an auxiliary result which expresses the tail probabilities of the multinomial distribution in terms of incomplete Dirichlet integrals (cf. Lemma 2.2 of Olkin and Sobel, 1965, or formula (35.29) of Johnson et al., 1997) (there are some misprints in the latter formula). Lemma 2.1. Let (N1 .... ,Nr,Nr+l) have a multinom&l distribution with k trials and probability vector ~-'~r+t (Pl ..... Pr, Pr+l), where p j > O and z-.,i=l Pi = 1. For c>~l such that rc<~k, we have ] ) P ( ( L{Ui>~c}
__FC)!~oPl... foP"(1 -- ZXi r) k-rcr1-I xc-I dxr ...dx2 dx,.
= (c - 1)!r(kk!
i=1
i=1
i=1
In what follows, fix t > 0 and let kn = [ n I-1/C. t], where [z] is the largest integer less than or equal to z. Furthermore, let Xn,c denote the number of cells containing at least c balls, after kn balls have been placed. Theorem 2.2. We have ,
where
~
Po
as
n ~
cx~,
~ ~ is convergence in distribution and Po( 2 ) denotes the Poisson distribution with parameter 2.
Proof. Writing Ai for the event that the ith cell contains at least c balls after the placement of kn balls, we n l{Ai}, where I{A} denotes the indicator function of an event A. From the exchangeability have X.n,c = ~-~i=1 of A1. . . . . An, the fact that Al A ... N Ar = 0 if rc >kn and Jordan's formula for the realization of l among n events (see, e.g., Feller, 1968, p. 106), it follows that, for fixed l>~0 and sufficiently large n,
[k,,/c] r=l
Let Ni denote the number of balls in the ith cell after the placement of kn balls (i = 1..... r), and let F Nr+l = k n - Y']~i=I Ni. Since the distribution of (Nl .... ,Nr+j) is multinomial with kn trials and probabilities
N. Henze I Statistics & Probability Letters 39 (1998) 333-336 pi = 1/n ( i = 1. . . . . r) and pr+l = 1 - r / n ,
Lemma 2.1 and the change of variable u i = n . x i
335 ( i = 1. . . . . r)
yield P(AI N . . . N Ar ) = P(N1 >~c. . . . . Nr>~c )
=Tn'fo "''fo HuCi--I 1-i=1
ui/n
du,
i=1
where du = d u r . . , du2 dul and
k,~ 7n = ( c -
1
1)!r(kn - rc)!
n rc
Since, by dominated convergence, the r-fold integral converges to c -r, and since (~r)~nr/r! and kn!/(k, zxD, it follows from the definition of k, that
r c ) ! ~ k,re as n
lllim --+
P(AI N . . . O A r ) = r!c! r n-~oo nr(c-l) :L(tc~
r
(3)
r! \ c ! J "
From (2) and Bonferroni's inequalities (see Feller, 1968, p. 110), we thus obtain
n t ~ o P ( X m c = l ) = ~-~ z"~
- l)! \ c! ]
r=l
=exp as was to be shown.
( -~
,
\~j
• 15'
[]
Since the event {X,,c ~> 1 } is equivalent to {T,,c ~
p(xo.~>~l) =P(A~ u - . . uA,,) [k,,/c] r=l
and proceed as above using (3) and Bonferroni's inequalities. As a generalization of the waiting time T,,c until the first c-fold collision occurs, let ,T(m),,cbe the number of balls needed until f o r the ruth time a ball is placed into a cell already containing c - 1 balls so that T~),.) = T,,c with this new notation. From Theorem 2.2, we obtain the following result on the asymptotic behavior of r(m) a n,c •
Corollary 2.3.
For f i x e d c >>-2 a n d m ~ 1, we have
lim P ( n l/~-I T~,(m) n--*~ c ~
-~
v=o \ c! J " ~. v
N. Henze / Statistics & Probability Letters 39 (1998) 333-336
336
(t > 0). In other words, Ej
as n--~ cx~,
j=l
where E1 . . . . . Em are independent standard exponential random variables. P r o o f . The first assertion is an i m m e d i a t e c o n s e q u e n c e o f T h e o r e m 2.2 since the events {Xn, c>>,m} and
{T~,"~~<<,nl-1/ct} are the same. The second assertion then follows on p e r f o r m i n g the transformation u ~-~ uC/c!.
References Diaconis, P., Mosteller, F., 1989. Methods for studying coincidences. J. Amer. Statist. Assoc. 84, 853-861. Dwass, M., 1969. More birthday surprises. J. Combin. Theory 7, 258-261. Feller, W., 1968. An Introduction to Probability Theory and Its Applications, vol. 1. 3rd ed., Wiley, New York. Hoist, L., 1986. On birthday, collectors', occupancy and other classical urn problems. Int. Statist. Rev. 54, 15-27. Hoist, L., 1995. The general birthday problem. Random Struct. Algorithms 6, 201-208. Joag-Dev, K., Proschan, F., 1992. Birthday problem with unlike probabilities. Amer. Math. Monthly 99, 10-12. Johnson, N.L., Kotz, S., 1977. Um Models and Their Application. An Approach to Modem Discrete Probability Theory. Wiley, New York. Johnson, N.L., Kotz, S., Balakrishnan, N., 1997. Discrete Multivariate Distributions. Wiley, New York. Klamkin, M.S., Newman, D.J., 1967. Extensions of the birthday surprise. J. Combin. Theory 3, 279-282. Mase, S., 1992. Approximations to the birthday problem with unequal occurrence probabilities and their application to the surname problem in Japan. Ann. Inst. Statist. Math. 44, 479-499. McKinney, E., 1966. Generalized birthday problem. Amer. Math. Monthly 73, 385-387. Nunnikhoven, T., 1992. A birthday problem solution for nonuniform birth frequencies. Ann. Statist. 46, 270-274. Olkin, I., Sobel, M., 1965. Integral expressions for tail probabilities of the multinomial and the negative multinomial distribution. Biometrika 52, 167-179.