ARTICLE IN PRESS Journal of Statistical Planning and Inference 140 (2010) 2355–2383
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Discrete q-distributions on Bernoulli trials with a geometrically varying success probability Ch.A. Charalambides Department of Mathematics, University of Athens, GR-15784 Athens, Greece
a r t i c l e in fo
abstract
Article history: Received 9 November 2009 Received in revised form 8 March 2010 Accepted 10 March 2010 Available online 19 March 2010
Consider a sequence of independent Bernoulli trials and assume that the odds of success (or failure) or the probability of success (or failure) at the ith trial varies (increases or decreases) geometrically with rate (proportion) q, for increasing i =1,2,y. Introducing the notion of a geometric sequence of trials as a sequence of Bernoulli trials, with constant probability, that is terminated with the occurrence of the first success, a useful stochastic model is constructed. Specifically, consider a sequence of independent geometric sequences of trials and assume that the probability of success at the jth geometric sequence varies (increases or decreases) geometrically with rate (proportion) q, for increasing j = 1,2,y. On both models, let Xn be the number of successes up the nth trial and Tk (or Wk) be the number of trials (or failures) until the occurrence of the kth success. The distributions of these random variables turned out to be q-analogues of the binomial and Pascal (or negative binomial) distributions. The distributions of Xn, for n-1, and the distributions of Wk, for k-1, can be approximated by a q-Poisson distribution. Also, as k-0, a zero truncated negative q-binomial distribution Uk ¼ Wk jWk 4 0 can be approximated by a q-logarithmic distribution. These discrete q-distributions and their applications are reviewed, with critical comments and additions. Finally, consider a sequence of independent Bernoulli trials and assume that the probability of success (or failure) is a product of two sequences of probabilities with one of these sequences depending only the number of trials and the other depending only on the number of successes (or failures). The q-distributions of the number Xn of successes up to the nth trial and the number Tk of trials until the occurrence of the kth success are similarly reviewed. & 2010 Elsevier B.V. All rights reserved.
MSC: primary 60C05 secondary 05A30 Keywords: Euler distribution Heine distribution Negative q-binomial distribution q-Binomial distribution q-Stirling distributions
Contents 1. 2. 3.
4.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2356 Some basic q-sequences and functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2356 Success probability varying with the number of trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2359 3.1. Success odds geometrically decreasing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2361 3.2. Success probability geometrically decreasing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2363 3.3. Failure odds geometrically increasing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2365 Success probability varying with the number of successes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2366 4.1. Success probability geometrically increasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2367 4.2. Success probability geometrically decreasing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2368
E-mail address:
[email protected] 0378-3758/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2010.03.024
ARTICLE IN PRESS 2356
5.
6.
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
Success probability varying with the number of trials and the number of successes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2373 5.1. Success probability geometrically varying with the same rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2374 5.2. Success probability geometrically varying with different rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2376 Limiting q-distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2377 6.1. q-Poisson distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2377 6.2. q-Logarithmic distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2381 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2382 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2382
1. Introduction Consider a sequence of independent Bernoulli trials and let Xn be the number of successes up to the nth trial. Also, let Tk (or Wk) be the number of trials (or failures) until the occurrence of the kth success. If the probability of success is p, then Xn and Tk (or Wk) obey the binomial and Pascal (or negative binomial) distribution, respectively. The Poisson distribution may be considered as a limiting case of the binomial distribution as n-1. Also, the logarithmic distribution may be considered as a limiting case of the zero-truncated negative binomial distribution as k-0. Poisson (1837), generalized the binomial distribution (and implicitly the negative binomial distribution) by assuming that the probability of success at the ith trial varies with the number of trials. The probability function of Xn was derived by Platonov (1976) in terms of the generalized signless Stirling numbers of the first kind. Balakrishnan and Nevzorov (1997) obtained this distribution as the distribution of the number of records up to time n in a general record model. The negative binomial distribution (and implicitly the binomial distribution) can be generalized to a different direction by first introducing the notion of a geometric sequence of trials. Specifically, a sequence of independent Bernoulli trials, with constant probability of success, which is terminated with the occurrence of the first success, is called a geometric sequence of trials. Consider now a sequence of independent geometric sequences of trials and assume that the probability of success at the jth geometric sequence varies with the number of sequences (successes). Note that this is equivalent to the assumption that the conditional probability of success at any trial varies with the number of successes occurring in the previous trials. The probability function of Xn was derived by Woodbury (1949) essentially in terms of generalized Stirling numbers of the second kind. Sen and Balakrishnan (1999) obtained the distribution of Tk in connection with a reliability model; their expression is also essentially in terms of the generalized Stirling numbers of the second kind. In the present paper, the discrete q-distributions defined on the stochastic model of a sequence of independent Bernoulli trials, with a geometrically varying success probability, are reviewed and as examples some of their applications are presented. Related models and distributions, connections of certain q-distributions and various comments on them are indicated as remarks. As it usually happens in a review paper, apart from a unified presentation, some new proofs of existing results and new results filling some gaps are included. Specifically, in Section 2 some basic q-sequences and functions are presented and their properties, which are useful in the presentation of the probability functions and moments of the q-distributions, are quoted. Section 3 is devoted to the Poisson–Bernoulli model, in which the success probability varies with the number of trials. Two basic theorems concerning the distributions of Xn and Tk are given and as corollaries the q-distributions defined on this model are reviewed. In Section 4 the Woodbury–Bernoulli model, in which the success probability varies with the number of successes, is considered and the distributions of Tk and Xn are obtained in two basic theorems. Further, as corollaries the q-distributions defined on this model are reviewed. A stochastic model, in which the success probability varies both with the number of trials and the number of successes in a specific manner and includes in particular the above two models, is considered in Section 5. The distributions of Xn and Tk are obtained in two theorems. Then as corollaries, the q-distributions for the cases the success probability varies with the same and with different rates are reviewed. Finally, Section 6 is concerned with the Heine and Euler distributions, which are two q-Poisson distributions, and the q-logarithmic distribution. The Heine and Euler distributions are obtained as limiting distributions of a q-binomial and a negative q-binomial distributions, respectively. It is noticed that the generalized Euler distribution, studied by Benkherouf and Alzaid (1993), is merely a (general) negative q-binomial distribution. The q-logarithmic distribution is obtained as a limiting distribution of a zero-truncated negative q-binomial distribution.
2. Some basic q-sequences and functions Let x and q be real numbers, with qa1, and k be an integer. The number [x]q = (1 qx)/(1 q) is called q-number and in particular [k]q is called q-integer. The kth order factorial of the q-number [x]q, which is defined by ½xk,q ¼ ½xq ½x1q ½xk þ1q ,
k ¼ 0,1, . . .
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2357
is called q-factorial of x of order k. In particular ½kq ! ¼ ½1q ½2q ½kq is called q-factorial. The q-binomial coefficient (or Gaussian polynomial) is defined by x ½xk,q , k ¼ 0,1, . . . : ¼ ½kq ! k q The general q-binomial and the negative q-binomial formulae may be expressed as 1 1 Y X x 1 þ tqi1 ð2k Þ ¼ q t k , jtj o1, 0 o q o1 x þ i1 k q i ¼ 1 1 þtq k¼0
ð2:1Þ
and 1 1 X Y x þ k1 1tqx þ i1 ¼ tk , i1 k q i ¼ 1 1tq k¼0
jtj o1, 0 o q o1,
respectively. In particular, for x =n a positive integer, these formulae reduce to n n X Y n k ð1 þ tqi1 Þ ¼ qð2Þ t k , 1o t o1 k q i¼1 k¼0
ð2:2Þ
ð2:3Þ
and n Y
ð1tqi1 Þ1 ¼
i¼1
1 X n þ k1 k
k¼0
tk ,
jtj o1, 0 o q o1,
ð2:4Þ
q
respectively. Expanding both members of the identity 1 1 Y 1þ tqi1 1 þ tqi1 Y 1 þðtqx Þqi1 ¼ x þ y þ i1 x þ i1 1þ ðtqx Þqy þ i1 i ¼ 1 1 þtq i ¼ 1 1 þtq i¼1 1 Y
into powers of t, by the aid of (2.1), and then equating the coefficients of tn in both sides of the resulting expression, the q-Cauchy formula n X x y xþy ¼ qðnkÞðxkÞ ð2:5Þ k q nk q n q k¼0 or, equivalently, the q-Vandermonde formula n X n ½x þ yn,q ¼ qðnkÞðxkÞ ½x ½y k q k,q nk,q k¼0
ð2:6Þ
is deduced. In general the transition from a formula to its q-analogue is not unique. Thus, the following two q-exponential functions Eq ðtÞ ¼
1 Y
ð1 þð1qÞqi1 tÞ ¼
i¼1
eq ðtÞ ¼
1 Y
1 X k¼0
ð1ð1qÞqi1 tÞ1 ¼
i¼1
k
qð2Þ
tk , ½kq !
1 X tk , ½k q! k¼0
1 o t o 1,
jtj o 1=ð1qÞ,
ð2:7Þ
ð2:8Þ
with Eq(t)eq ( t)=1, will be used in the sequel. The noncentral q-Stirling numbers of the first and second kind sq(n,k;r) and Sq(n,k;r), which connect the noncentral q-factorials with the q-numbers, are defined by n
½trn,q ¼ qð2Þrn
n X
sq ðn,k; rÞ½tkq ,
n ¼ 0,1, . . .
ð2:9Þ
n ¼ 0,1, . . . ,
ð2:10Þ
k¼0
and ½tnq ¼
n X
k
qð2Þ þ rk Sq ðn,k; rÞ½trk,q ,
k¼0
40 and Sq ðn,k; rÞ 40. More generally, the noncentral generalized respectively. Note that jsq ðn,k; rÞj ¼ sq ðn,k; rÞ=½1nk q q-factorial coefficients Cq(n,k;s,r) are defined by n
½st þ rn,q ¼ qð2Þ þ rn
n X k¼0
k
qsð2Þ Cq ðn,k; s,rÞ½tk,qs ,
n ¼ 0,1, . . . :
ð2:11Þ
ARTICLE IN PRESS 2358
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
For r =0, these numbers reduce to sq ðn,k; 0Þ sq ðn,kÞ, Sq ðn,k; 0Þ Sq ðn,kÞ and Cq(n,k;s,0)= Cq(n,k;s), the usual (central) q-Stirling numbers of the first and second kind and the generalized q-factorial coefficients, respectively. Also, Lq ðn,k; rÞ ¼ Cq1 ðn,k; 1,rÞ and Lq ðn,kÞ ¼ Cq1 ðn,k; 1Þ are the noncentral and central q-Lah numbers, respectively. The values of jsq ðn,kÞj ¼ jsq ðn,k; 0Þj, for k=1,2 and n = k,k+ 1,y, which will be used in the next sections, are quoted here for easy reference: jsq ðn,1Þj ¼ qn1 ½n1q !,
jsq ðn,2Þj ¼ qn2 ½n1q !zn1,q ð1Þ,
ð2:12Þ
Pn
where zn,q ðkÞ ¼ j ¼ 1 1=½jkq , k Z 1, n =1,2,y . The q-binomial coefficients and the q-Stirling numbers constitute particular cases of certain symmetric functions. These functions, also known as generalized Stirling numbers, are briefly presented for easy reference. The generalized signless Stirling numbers of the first kind may be defined by n Y
n X
ðt þ ai Þ ¼
i¼1
jsðn,k; aÞjt k ,
ð2:13Þ
k¼0
with a ¼ ða1 ,a2 , . . . ,an Þ. Note that X jsðn,nk; aÞj ¼ ai1 ai2 aik ,
ð2:14Þ
where the summation is extended over all k-combinations of the n indices {1,2,y,n}, is the elementary symmetric function (with respect to the n variables a1,a2,y, an). The values of jsðn,k; aÞj, for k=1,2 and n= k,k +1,y, which will be used in the next sections, are quoted here for easy reference: 1 ! ! !0 j1 n n n n X Y X Y X 1 1 A @ jsðn,1; aÞj ¼ , jsðn,2; aÞj ¼ ai ai : ð2:15Þ a aa i¼1 i¼1 i i¼1 j¼2i¼1 i j The generalized Stirling numbers of the second kind may be defined by tn ¼
n X
Sðn,k; aÞ
k Y
ðtai Þ,
ð2:16Þ
i¼1
k¼0
with a ¼ ða1 ,a2 , . . . ,an Þ, or equivalently by k Y
1 X
ð1ai tÞ1 ¼
i¼1
Sðn1,k1; aÞt nk :
ð2:17Þ
n¼k
Note that Sðn þ k1,n1; aÞ ¼
X
ar11 ar22 arnn ,
ð2:18Þ
where the summation is extended over all ri = 0,1,y,k, i= 1,2,y,n, such that r1 þ r2 þ þ rn ¼ k, is the homogeneous product sum symmetric function. The values Sðk þ i1,k1; aÞ, for i= 1,2 and k= 1,2,y, which will be used in the next sections, are quoted here for easy reference: Sðk,k1; aÞ ¼
k X
aj ,
Sðk þ 1,k1; aÞ ¼
j¼1
j k X X
ai aj :
ð2:19Þ
j¼1i¼1
The generalized Lah numbers, which are connected with the generalized Stirling numbers, may be defined by n Y i¼1
ðtai Þ ¼
n X
Cðn,k; a,bÞ
k¼0
k Y
ðtbj Þ,
ð2:20Þ
j¼1
Q with a ¼ ða1 ,a2 , . . . ,an Þ and b ¼ ðb1 ,b2 , . . . ,bk Þ. Expanding ni¼ 1 ðt þ ai Þ into powers of t, using (2.13), and in the resulting Qk expression expanding the powers of t into products j ¼ 1 ðtbj Þ, k= 0,1,y,n, using (2.16), we deduce the relation Cðn,k; a,bÞ ¼
n X
jsðn,m; aÞjSðm,k; bÞ:
ð2:21Þ
m¼k
Also, Cðn,k; a,0Þ ¼ jsðn,k; aÞj and Cðn,k; 0,bÞ ¼ Sðn,k; bÞ. The generalized Stirling and Lah numbers for particular cases of the sequence ai, i= 1,2,y, reduce to well known numbers. Thus, for ai ¼ 1=ðyqi1 Þ, i =1,2,y, (2.13), on using the q-binomial formula (2.3), implies n n k : ð2:22Þ jsðn,k; aÞj ¼ ð1=yÞnk qð2Þ þ ð2Þ k q
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
Also, for ai ¼ yqi1 , i=1,2,y, (2.17), on using the negative q-binomial formula (2.4), yields nk n Sðn,k; aÞ ¼ y : k q
2359
ð2:23Þ
Further, for ai ¼ y½r þ i1q ¼ yð1qr þ i1 Þ=ð1qÞ, i=1,2,y, (2.13), on using the expression n Y
n X
ðt þ q½r þ i1q Þ ¼
i¼1
jsq ðn,k; rÞjt k ,
k¼0
equivalent to (2.9), gives jsðn,k; aÞj ¼ ðy=qÞnk jsq ðn,k; rÞj: r þ i1
Also, for ai ¼ y½r þ i1q ¼ yð1q tn ¼
n X
Sq ðn,k; rÞ
k Y
ð2:24Þ Þ=ð1qÞ, i=1,2,y, (2.16), on using the expression
ðt½r þi1q Þ,
i¼1
k¼0
equivalent to (2.10), yields nk
Sðn,k; aÞ ¼ y
Sq ðn,k; rÞ:
Further, for ai = (1 q
ð2:25Þ
(r + i 1)
n X
½u þr þ n1n,q ¼
j1
), i=1,2,y, and bj = 1 q
n
u
, j =1,2,y, (2.20), by setting t = 1 q , reduces to
k
qð2Þ þ ð2Þ þ rn ð1qÞkn Cðn,k; a,bÞ½uk,q ,
k¼0
which, compared to (2.6), with x= n + r 1 and y= u, implies n n k Cðn,k; a,bÞ ¼ ð1qÞnk qð2Þ þ ð2ÞrðnkÞ ½n þ r1nk,q : k q n
ð2:26Þ
k
It is noteworthy that jLq ðn,k; rÞj ¼ qð2Þ þ ð2ÞrðnkÞ ½nk q ½n þ r1nk,q is the absolute noncentral q-Lah number. Also, for ai = (1 q (r i + 1)), i= 1,2,y,[r], and bj =1 qj-1, j = 1,2,y, (2.20), by setting t= 1 qu, reduces to ½u þrn,q ¼
n X
n
k
qð2Þ þ ð2Þ þ rn ð1qÞkn Cðn,k; a,bÞ½uk,q ,
k¼0
which, compared to (2.6), with x= r and y= u, implies n n k Cðn,k; a,bÞ ¼ ð1qÞnk qð2Þ þ ð2ÞðrkÞðnkÞ ½r : k q nk,q
ð2:27Þ
Finally, for ai = (1 q (r + i 1)), i= 1,2,y, and bj =1 qs(j 1), j =1,2,y, s4 0, (2.20), by setting t= 1 qsu, reduces to ½su þ r þn1n,q ¼
n X
n
k
qð2Þ þ sð2Þ þ rn
k¼0
ð1qs Þk Cðn,k; a,bÞ½uk,q : ð1qÞn
Comparing it to the expression ½su þ r þn1n,q ¼ ½1nq ½surn,q1 ¼
n X
n
k
qð2Þ þ sð2Þ þ rn jCq1 ðn,k; s,rÞj½uk,q ,
k¼0
with jCq1 ðn,k; s,rÞj ¼ ½1nq Cq1 ðn,k; s,rÞ, which is readily deduced from (2.11), it follows that Cðn,k; a,bÞ ¼
ð1qÞn ð1qs Þk
jCq1 ðn,k; s,rÞj:
ð2:28Þ
3. Success probability varying with the number of trials Poisson (1837) considered a sequence of independent Bernoulli trials with the probability of success at the ith trial, Pi ðfsgÞ ¼ pi ,
i ¼ 1,2, . . . ,
varying with the number of trials. The probability function and factorial moments of the number of successes in a specific number of trials are given in the following theorem. Theorem 3.1. The probability function of the number Xn of successes in n trials is given by jsðn,x; aÞj , PðXn ¼ xÞ ¼ Qn i ¼ 1 ð1 þ ai Þ
x ¼ 0,1, . . . ,n,
ð3:1Þ
ARTICLE IN PRESS 2360
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
where ai = (1 pi)/pi, i= 1,2,y,n, and jsðn,x; aÞj is the generalized signless Stirling number of the first kind. Its factorial moments are given by j!jsðn,j; a þ1Þj , E½ðXn Þj ¼ Qn i ¼ 1 ð1 þai Þ
j ¼ 1,2, . . . ,n,
ð3:2Þ
and E[(Xn)j]=0, for j =n + 1,n + 2,y . In particular, its mean and variance are given by EðXn Þ ¼
n X
n X 1 ¼ pi , 1 þai i¼1 i¼1
VðXn Þ ¼
n X
ai
2 i ¼ 1 ð1 þ ai Þ
¼
n X
pi ð1pi Þ:
ð3:3Þ
i¼1
Proof. The probability generating function of Xn is given Eðt Xn Þ ¼
n Y
ð1pi þ pi tÞ ¼
i¼1
n Y
ðt þai Þ
.Yn
i¼1
i¼1
ð1 þ ai Þ,
which, by (2.13), implies (3.1). Further, the factorial moment generating function of Xn, E½ð1 þ tÞXn ¼
n Y
ð1þ pi tÞ ¼
i¼1
n Y
ðt þ ai þ 1Þ
.Yn
i¼1
i¼1
ð1 þai Þ,
expanded into powers of t, using again (2.13), implies (3.2). The mean and variance of Xn, using (3.2), for j =1,2, and (2.15), are readily deduced as (3.3). Equivalently, using the expression of Xn as a sum of n independent zero–one Bernoulli random variables, these expressions are directly obtained. & The probability function of the number Tk of trials until the occurrence of the kth success, on using the expression P(Tk = n)= P (Xn 1 =k 1)pn, is readily deduced from (3.1). Thus, the following theorem is obtained. Theorem 3.2. The probability function of the number Tk of trials until the occurrence of the kth success is given by PðTk ¼ nÞ ¼
jsðn1,k1; aÞj Qn , i ¼ 1 ð1 þ ai Þ
n ¼ k,k þ 1, . . . ,
ð3:4Þ
where ai = (1 pi)/pi, i= 1,2,y,k, and jsðn1,k1; aÞj is the generalized signless Stirling number of the first kind. Remark 3.1 (Number of failures until the occurrence of the kth success). The probability function of the number Wk of failures until the occurrence of the kth success is closely connected to the probability function of the number Tk of trials until the occurrence of the kth success. Clearly Wk = Tk k and so PðWk ¼ wÞ ¼ PðTk ¼ k þwÞ,
w ¼ 0,1, . . . :
ð3:5Þ
The expression of the probability function of Xn in terms of the generalized signless Stirling numbers of the first kind was derived by Platonov (1976). It should be noted that the moments of Tk (and Wk) do not always exist. For example in the case pi = 1/i, i= 1,2,y, they do not exist, while in the case pi =1/[i]q, i= 1,2,y, they do exist (cf. Examples 3.1 and 3.5). Conditions on the probability sequence pi, i =1,2,y, that guarantee the existence of the moments of Tk have not been examined yet. In the following example an interesting application of these generalized binomial and negative binomial distributions in the theory of records, due to Balakrishnan and Nevzorov (1997), is presented. Example 3.1 (Distributions of the number of records and the record times). Consider a sequence of random variables Xi, i= 1,2,y . The random variable Xj, j Z 2, is called a record if Xj 4Xi for all i=1,2,y,j 1; by convention X1 is a record. Let Nn be the number of records up to time (index) n and Tk the time of the kth record. Motivated by the increasing frequency of record breaking in the Olympic games, Yang (1975) proposed a model in which the breakings are attributed to the increase in the population size. Specifically, it was assumed that the random variable Xi, i= 1,2,y, is the maximum of an increasing number ai of independent and identically distributed random variables; that is Xi ¼ maxfXi,1 ,Xi,2 , . . . ,Xi, ai g, where Xi,j, j ¼ 1,2, . . . , ai , i= 1,2,y, is a double sequence of independent and identically distributed random variables, with an absolutely continuous distribution function F(x), and ai is the population size of the athletes of the world at the ith Olympic game, i= 1,2,y. Then, Xi, i =1,2,y, is a sequence of independent random variables with FXi ðxÞ ¼ ½FðxÞai , i=1,2,y. In order to find the distributions of the random variables Nn and Tk, consider the record indicator random variables Ij, j = 1,2,y, defined by Ij = 1, if Xj is a record and Ij = 0, if Xj is not a record. Nevzorov (1984) proved that the record indicator random variables Ij, j= 1,2,y, are independent and pj ¼ PðIj ¼ 1Þ ¼ aj =ða1 þ a2 þ þ aj Þ, j =1,2,y, with ai 4 0, i= 1,2,y, not necessarily integers. Therefore, the probability function and factorial moments of Nn are given by (3.1) and (3.2), respectively, where ai ¼ ða1 þ a2 þ þ ai1 Þ=ai . Also, the probability function of Tk is given by (3.4). Note that the moments of Tk do not always exist. For example in the classical case ai ¼ 1, i= 1,2,y, they do not exist. A case in which these moments do exist is presented in Example 3.5.
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2361
The probability functions of the random variables Xn and Tk in the case of a geometrically varying success probability can be deduced from (3.1) and (3.4), respectively. Also, the factorial moments of Xn may be deduced from (3.2). Alternatively, their derivation is facilitated by the following expression of the factorial moments in terms of the q-factorial moments, which was essentially obtained by Dunkl (1981) (see also Charalambides and Papadatos, 2005). Lemma 3.1. The factorial moments of a q-distribution are expressed in terms of its q-factorial moments by E½ðXÞj ¼ j!
1 X
jsq ðm,jÞjðq1 1Þmj
m¼j
Eð½Xm,q Þ , ½mq !
j ¼ 1,2, . . . ,
ð3:6Þ
where jsq ðm,jÞj is the signless q-Stirling number of the first kind. 3.1. Success odds geometrically decreasing Consider first a sequence of independent Bernoulli trials and assume that the odds of success at the ith trial is given by
yi ¼ yqi1 , i ¼ 1,2, . . . , 0 o q o1, 0 o y o 1, which is a geometrically decreasing sequence with rate (proportion) q. Since the odds of success at the ith trial, yi , is connected to the probability of success at the ith trial, pi, by yi ¼ pi =ð1pi Þ, it follows that pi ¼
yqi1 , i ¼ 1,2, . . . , 0 o qo 1, 0 o y o 1: 1 þ yqi1
The probability function and the factorial moments of the number of successes in a specific number of trials are given in the following corollary of Theorem 3.1. Corollary 3.1. The probability function of the number Xn of successes in n trials is given by x x n qð2Þ y , x ¼ 0,1, . . . ,n, Qn PðXn ¼ xÞ ¼ x q i ¼ 1 ð1 þ yqi1 Þ for 0 o y o 1 and 0 o qo 1. Its factorial moments are given by m 1 n X n y ðq 1Þmj jsq ðm,jÞj m , j ¼ 1,2, . . . ,n, E½ðXn Þj ¼ j! qð 2 Þ Qm i1 Þ m i ¼ 1 ð1 þ yq q m¼j
ð3:7Þ
ð3:8Þ
and E[(Xn)j]=0, for j= n + 1,n + 2,y, where jsq ðm,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by EðXn Þ ¼
n X
n X yqi1 yqi1 , VðX Þ ¼ : n i1 1þ y q ð1 þ yqi1 Þ2 i¼1 i¼1
Proof. The probability function (3.7) is readily deduced from (3.1) by setting ai ¼ 1=ðyqi1 Þ, i= 1,2,y, and using (2.22). The factorial moments (3.8) can be more easily obtained by using Lemma 3.1. Specifically, the mth q-factorial moment of Xn, on using (3.7), is expressed as m x m x n n X X ½nm,q qð 2 Þ y n nm qð2Þ y xm Qn Q Eð½Xn m,q Þ ¼ ¼ ½xm,q qð 2 Þ ðyqm Þxm , n i1 Þ x q i ¼ 1 ð1 þ yqi1 Þ i ¼ 1 ð1 þ yq x¼m x ¼ m xm q which, by the q-binomial formula (2.3), yields m
m
½nm,q qð 2 Þ y Eð½Xn m,q Þ ¼ Qm , i1 Þ i ¼ 1 ð1 þ yq
m ¼ 1,2, . . . :
Introducing the last expression into (3.6), the required formula (3.8) is deduced. The mean and variance of Xn may be deduced from (3.8), using (2.12), or directly from (3.3). & Kemp and Kemp (1991) examined this distribution in their discussion of selecting a stochastic model that fits to Weldon’s classical dice data. Specifically, in a throw of n distinguishable dice it was assumed that the odds of success of the ith die satisfies a log-linear relation log yi ¼ log y þ ði1Þlog q,
i ¼ 1,2, . . . ,n,
where the occurrence of 5 or 6 in a throw of a die reckoned a success. Then, the probability function of the number Xn of successes was obtained as (3.7). Remark 3.2 (Success odds geometrically increasing). Consider the case in which the odds of success at the ith trial is given by
yi ¼ lQ i1 , i ¼ 1,2, . . . , Q 41, 0 o l o 1,
ARTICLE IN PRESS 2362
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
which is a geometrically increasing sequence with rate (proportion) Q. It may be transformed to the above case of a 1 geometrically decreasing success odds by interchanging the notions of success and failure and setting l ¼ y and Q= q 1. Remark 3.3 (Stationary distribution in a birth and death process). Consider a homogeneous birth and death process X(t), t Z 0, with birth and death rates lj 40, j =0,1,y, and mj 4 0, j = 1,2,y, respectively. The stationary distribution limt-1 P½XðtÞ ¼ x ¼ PðX ¼ xÞ, x= 0,1,y, satisfies the recurrence relation PðX ¼ xÞ ¼ ðlx1 =mx ÞPðX ¼ x1Þ,
x ¼ 1,2, . . . ,
and so PðX ¼ xÞ ¼ PðX ¼ 0Þ
x Y lj1 j¼1
mj
,
x ¼ 1,2, . . . ,
with 0 PðX ¼ 0Þ ¼ @1 þ
11 1 Y x X lj1 A x¼1j¼1
mj
,
Qx P provided 1 x¼1 j ¼ 1 ðlj1 =mj Þ o 1. Clearly, any probability function satisfying a recurrence relation of this form may be interpreted as a stationary distribution of a birth and death process. Thus, the q-binomial distribution (3.7) may be considered as the stationary distribution of a birth and death process with birth and death rates
lj ¼ yqj ½njq ¼ yð½nq ½jq Þ, j ¼ 0,1, . . . ,n, mj ¼ ½jq , j ¼ 1,2, . . . ,n: An application of the q-binomial distribution (3.7), examined by Kemp and Newton (1990), is given in the following example. Example 3.2 (Stationary distributions for dichotomized parasite populations). Consider a population of size n comprising two types of parasites, active and passive. This population is dichotomized to the sub-populations of type A parasites that are on hosts without open wounds and of type B parasites that are on hosts with open wounds. In order to simplify the model assume that no host has more than one parasite. Let Xn(t) be the number of active parasites on hosts without open wounds at time t Z 0. Suppose that at time t, j of the n parasites are on hosts without open wounds (type A parasites) and the other n j parasites are on hosts with open wounds (type B parasites). If a host which has never previously been parasitized is available, then one of the active parasites may transfer to it instead of relocating on the existing host; active type A parasites are assumed to take priority over active type B parasites. Let p be the probability that a parasite is active, whence q =1 p is the probability that it is passive. Also, let y=ð1 þ yÞ be the probability that an active parasite is able to move to a host that has never previously been parasitized. Then, Xn(t) is a birth and death process with birth and death rates proportional to bj ¼
yqj ð1qnj Þ 1qj , j ¼ 0,1, . . . ,n, dj ¼ , j ¼ 1,2, . . . ,n: 1þ y 1þ y
Therefore, according to Remark 3.3, the stationary distribution P(Xn = x), x = 0,1,y,n, is given by (3.7). The number Yn(t) of active parasites on hosts with open wounds at time t Z 0 is also a birth and death process and its stationary distribution P(Yn =x), x = 0,1,y,n, according to Remark 3.2, is given by (3.7) with parameters y and q replaced by l ¼ y1 and Q= q 1. The probability function and factorial moments of the number of failures until the occurrence of a given number of successes are obtained in the following corollary. Corollary 3.2. The probability function of the number Wk of failures until the occurrence of the kth success is given by k kþ w1 yk qð2Þ þ w , w ¼ 0,1, . . . , PðWk ¼ wÞ ¼ Qk þ w i1 Þ w q i ¼ 1 ð1 þ yq for 0 o y o 1 and 0 o q o1. Its factorial moments are given by 1 X k þ m1 ðq1 1Þmj jsq ðm,jÞj , j ¼ 1,2, . . . , E½ðWk Þj ¼ j! m m ym qð 2 Þ þ km q m¼j
ð3:9Þ
ð3:10Þ
where jsq ðm,jÞj is the signless q-Stirling number of the first kind. Proof. The probability function (3.9) is readily deduced from (3.5) together with (3.4), by setting ai ¼ 1=ðyqi1 Þ, i= 1,2,y, and using (2.22). The factorial moments (3.10) can be more easily obtained by using Lemma 3.1. Specifically, the mth
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2363
q-factorial moment of Wk, is expressed as k 1 1 X X kþ w1 k þ w1 yk qð2Þ þ w qw k k ¼ ½kþ m1m,q y qð2Þ Eð½Wk m,q Þ ¼ ½wm,q Qk þ w Qk þ w i1 i1 Þ w wm Þ w¼m w¼m q q i ¼ 1 ð1 þ yq i ¼ 1 ð1þ yq and since 1 X k þ w1 wm
w¼m
q
Qk þ w
qw
i1 Þ i ¼ 1 ð1 þ yq
¼
1
yk þ m qð
kþm 2 Þ
is obtained as ½k þ m1m,q
Eð½Wk m,q Þ ¼
ym qð 2 Þ þ km m
,
m ¼ 1,2, . . . :
Introducing the last expression into (3.6), the required formula (3.10) is deduced.
&
Remark 3.4 (Number of successes until the occurrence of the kth failure). The probability function of the number Yk of successes until the occurrence of the kth failure is closely connected to (3.9). Clearly PðYk ¼ yÞ ¼ PðXk þ y1 ¼ yÞð1pk þ y Þ ¼ PðWy þ 1 ¼ k1Þ
1pk þ y pk þ y
and so " PðYk ¼ yÞ ¼
kþ y1
#
y
y
y
qð2Þ y , Qk þ y i1 Þ i ¼ 1 ð1þ yq q
y ¼ 0,1, . . . ,
for 0 o y o1 and 0 oq o 1. The factorial moments of Yk can be easily obtained as m 1 X kþ m1 ym qð 2 Þ ðq1 1Þmj jsq ðm,jÞj E½ðYk Þj ¼ j! , j ¼ 1,2, . . . : Qm i1 Þ m i ¼ 1 ð1þ yq q m¼j
ð3:11Þ
ð3:12Þ
3.2. Success probability geometrically decreasing Let us now consider a sequence of independent Bernoulli trials and assume that the probability of success at the ith trial is given by pi ¼ yqi1 ,
i ¼ 1,2, . . . , 0 oq o 1, 0 o y r 1,
which is a geometrically decreasing sequence with rate (proportion) q. The probabilities involved in this model are more conveniently written in terms of a new parameter r that replaces the parameter y by y ¼ qr . Then pi ¼ qr þ i1 ,
i ¼ 1,2, . . . , 0 o qo 1, 0 r r o1:
The probability function and factorial moments of the number of successes in a specific number of trials are given in the following corollary. Corollary 3.3. The probability function of the number Xn of successes in n trials is given by n
PðXn ¼ xÞ ¼ qð2Þ þ rn ð1qÞnx jsq1 ðn,x; rÞj,
x ¼ 0,1, . . . ,n,
ð3:13Þ
with 0 o qo 1 and 0 r r o 1, where jsq ðn,x; rÞj is the noncentral signless q-Stirling number of the first kind. Its factorial moments are given by " # n j E½ðXn Þj ¼ j!qð2Þ þ rj , j ¼ 1,2, . . . ,n, ð3:14Þ j q
and E[(Xn)j]=0, for j= n +1,n +2,y . In particular, its mean and variance are given by EðXn Þ ¼ qr ½nq ,
VðXn Þ ¼ qr ½nq q2r ½nq2 :
Proof. The probability function (3.13) is readily deduced from (3.1) by setting ai ¼
1pi ¼ ð1qðr þ i1Þ Þ ¼ ð1qÞ½r þ i1q1 , pi
i ¼ 1,2, . . . ,
and using (2.24). Also, since ai +1=1/qr + i 1, i= 1,2,y, the factorial moments (3.14) are obtained from (3.2) by using (2.22). The mean and variance of Xn may be deduced from (3.14) or directly from (3.3). & An application of this distribution in the theory of random graphs, examined by Crippa et al. (1997), is presented in the following example.
ARTICLE IN PRESS 2364
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
Example 3.3 (The number of sources (or sinks) in a random acyclic digraph). A graph G is a pair (V,E), with V the set of vertices (nodes) and E D V 2 the set of edges. An acyclic digraph (directed graph) is a graph with E ¼ fði,jÞ : i,j 2 V,i o jg or E ¼ fði,jÞ : i,j 2 V,i 4jg. A vertex (node) in an acyclic digraph is called source (or sink) if it does not have any predecessor (or successor). Let us denote by Gn,q a random graph of n notes in which any edge occurs independently with probability p= 1 q. Consider the sequential construction of an acyclic random digraph Gk,q from Gk 1,q through the addition of the node k. This addition will create a new source if no node of Gk 1,q is connected to k. Thus, the probability pk that the addition of node k to of Gk 1,q creates a new node is given by pk =qk 1, k=1,2,y . Since the sequential additions of nodes constitute a sequence of independent Bernoulli trials with success the creation of a new node, it follows from Corollary 3.4 that the probability function of the number Xn of sources in a random acyclic digraph Gn,q is given by (3.13), with r =0. The probability function of the number of trials until the occurrence of a given number of successes is obtained in the following corollary of Theorem 3.2. This probability function was given in Charalambides (2004). Corollary 3.4. The probability function of the number Tk of trials until the occurrence of the kth success is given by n
PðTk ¼ nÞ ¼ qð2Þ þ rn ð1qÞnk jsq1 ðn1,k1; rÞj,
n ¼ k,k þ 1, . . . ,
ð3:15Þ
with 0 o q o1 and 0 rr o 1, where jsq ðn1,k1; rÞj is the noncentral signless q-Stirling number of the first kind. Remark 3.5 (Success probability geometrically increasing). Consider the case in which the probability of success at the ith trial is given by pi ¼ yQ i1 ,
i ¼ 1,2, . . . ,½r þ 1, Q 41, 0 o y r 1,
with [r +1] the integral part of r + 1 and r ¼ log y=logQ , which is a geometrically increasing sequence with rate (proportion) Q. It may be transformed to the preceding case by setting y ¼ qr ¼ ðq1 Þr and Q=q 1; the upper bound i r ½r þ 1 ensures that pi r 1. An application of this stochastic model as a defense model against an approaching attacker (missile), due to Kemp (1987), is discussed in the next example. Example 3.4 (Successful shots at an approaching attacker). Consider a defense situation in which the motion of an attacker (missile) relative to the defender is assumed to be towards the defender with relative speed v. The attacker will annihilate the defender if it reaches him before he has disabled it. The defender fires a sequence of shots at the attacker when the distance between them is Di =d (i 1)uv, i=1,2,y,m, where d is the distance when the first shot is fired, u is the time interval between shots and m is a fixed number such that dðm1Þuv 40. Suppose that the distance Di a shot would travel is an exponential random variable with mean EðDi Þ ¼ 1=l, that the shot is correctly aimed, and that it is successful if it reaches the attacker. Then the probability that the ith shot is successful is pi ¼ expflði1Þuvldg,
i ¼ 1,2, . . . ,m:
Considering the sequence of shots as a sequence of independent Bernoulli trials, and setting y ¼ expfldg and Q ¼ expfluvg, it follows that the probability of success (successful shot) at the ith trial is given by pi ¼ yQ i1 ,
i ¼ 1,2, . . . ,½r þ 1, Q 41, 0 o y r 1,
with [r +1] the integral part of r +1 and r ¼ logy=logQ . This is the model of Remark 3.5, and so the probability function and factorial moments of the number Xn of successful shots is deduced from (3.13) and (3.14), by replacing q by Q=q 1 and r by r, as n
PðXn ¼ xÞ ¼ qð2Þ þ rn ð1q1 Þnx jsq ðn,x; rÞj,
x ¼ 0,1, . . . ,n
and ð2j Þ þ rj
E½ðXn Þj ¼ j!q
" # n j
,
j ¼ 1,2, . . . ,n:
q1
The mean and variance are obtained as EðXn Þ ¼ qr ½nq1 ,
VðXn Þ ¼ qr ½nq1 q2r ½nq2 :
An equivalent expression of the probability function of Xn as an alternate sum of its binomial moments bj ¼ E½ðXjn Þ was given in Kemp (1987). A second random variable, which is also of interest in this defense model, is the number Tk of shots until the occurrence of the kth successful shot. Its probability function is deduced from (3.15), by replacing q by Q= q 1
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2365
and r by r, as n
PðTk ¼ nÞ ¼ qð2Þ þ rn ð1q1 Þnk jsq ðn1,k1; rÞj,
n ¼ k,kþ 1, . . . :
3.3. Failure odds geometrically increasing Finally, consider a sequence of independent Bernoulli trials and assume that the odds of failure at the ith trial is given by
li ¼ q½r þi1q ¼
q ð1qr þ i1 Þ, 1q
i ¼ 1,2, . . . , 0 o q o1,
which is a geometrically increasing sequence with rate (proportion) q. Since the odds of failure at the ith trial, li , is connected to the probability of success at the ith trial, pi, by li ¼ ð1pi Þ=pi , it follows that pi ¼
1 , ½r þ iq
i ¼ 1,2, . . . , 0 o qo 1,
which is a decreasing sequence. The probability function and factorial moments of the number of successes in a specific number of trials are given in the following corollary. Corollary 3.5. The probability function of the number Xn of successes in n trials is given by PðXn ¼ xÞ ¼
jsq ðn,x; rÞj , ½r þnn,q
x ¼ 0,1, . . . ,n,
ð3:16Þ
with 0 o q o1, 0 r r o1, where jsq ðn,x; rÞj is the noncentral signless q-Stirling number of the first kind. Its factorial moments are given by E½ðXn Þj ¼
j!qðnjÞ jsq ðn,j; r þ 1Þj , ½r þnn,q
j ¼ 1,2, . . . ,n,
ð3:17Þ
and E[(Xn)j]=0, for j= n +1,n +2,y . In particular, its mean and variance are given by EðXn Þ ¼
n X
1 , ½r þ iq i¼1
VðXn Þ ¼
n X q½r þ i1q i¼1
½r þ i2q
:
Proof. The probability function (3.16) is readily deduced from (3.1) by setting ai = q[r + i 1]q, i= 1,2,y, and using (2.24), with y ¼ q. Similarly, since 1+ ai = [r + i]q, i=1,2,y, its factorial moments are deduced from (3.2) as (3.17). The mean and variance of Xn may be directly deduced from (3.3). & The probability function of the number of trials until the occurrence of a given number of successes, may be similarly deduced from (3.4). Corollary 3.6. The probability function of the number Tk of failures until the occurrence of the kth success is given by PðTk ¼ nÞ ¼
jsq ðn1,k1; rÞj , ½r þnn,q
n ¼ k,kþ 1, . . . ,
ð3:18Þ
with 0 o qo 1, 0 rr o 1, where jsq ðn1,k1; rÞj is the noncentral signless q-Stirling number of the first kind. In the following example, which is a continuation of Example 3.1, an interesting application of these distributions in the theory of record breaking, examined in Charalambides (2007), is presented. Example 3.5 (A model for the breaking of records in the Olympic games). Returning to Example 3.1, consider the particular case of a geometrically increasing population, ai ¼ yqi þ 1 , i= 1,2,y . Then pi =1/[i]q, i= 1,2,y, and the probability function and factorial moments of the number Xn of records up to time n are given by (3.16) and (3.17) with r = 0, where [n]n,q = [n]q! and jsq ðn,j; 1Þj ¼ jsq ðn þ1,j þ 1Þj. Also, the probability function of the kth record time Tk is given by (3.18) with r = 0. The mean of Tk, which in this case do exist, was obtained by Charalambides (2007), after some algebraic manipulations, as ! 1 i1 i X k q qik i ð1qÞ EðTk Þ ¼ : ð1Þi qð2Þ 1 þ ½iq ! ½iq 1q i ¼ 1 ½i þ1kq Analogous results were obtained by Charalambides (2009) for the more general case of a q-factorially increasing population. Yang (1975) in a very interesting paper obtained the asymptotic distribution of the kth inter-record Wk = Tk Tk 1 as a geometric with success probability p= 1 q.
ARTICLE IN PRESS 2366
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
4. Success probability varying with the number of successes Consider a sequence of independent geometric sequences of trials with probability of success at the jth geometric sequence of trials, Qj ðfsgÞ ¼ pj ,
j ¼ 1,2, . . . ,
varying with the number of geometric sequences of trials, or equivalently, varying with the number of successes in the sequence of Bernoulli trials. The probability function and factorial moments of the number of trials until the occurrence of a specific number of successes are given in the following theorem. Theorem 4.1. The probability function of the number Tk of trials until the occurrence of the kth success is given by 0 1 k Y PðTk ¼ nÞ ¼ @ pj ASðn1,k1; qÞ, n ¼ k,k þ 1, . . . ,
ð4:1Þ
j¼1
where qj =1 pj, j =1,2,y, and Sðn1,k1; qÞ is the generalized Stirling number of the second kind. Its ascending factorial moments are given by E½ðTk þ i1Þi ¼ i!Sðk þi1,k1; aÞ,
i ¼ 1,2, . . . ,
ð4:2Þ
where aj =1/pj, j =1,2,y . In particular, its mean and variance are given by EðTk Þ ¼
k X
aj ¼
j¼1
k X 1 , p j¼1 j
k X
VðTk Þ ¼
aj ðaj 1Þ ¼
j¼1
k X 1pj j¼1
p2j
:
ð4:3Þ
Proof. Let Yj be the number of trials in the jth geometric sequence of trials, j= 1,2,y . Then, Tk ¼ Y1 þ Y2 þ þ Yk , where Yj, j = 1,2,y,k are independent geometric random variables, with probability function PðYi ¼ yÞ ¼ pj qy1 , j
y ¼ 1,2, . . . , qj ¼ 1pj , j ¼ 1,2, . . . ,
and probability generating function EðuYj Þ ¼ pj uð1qj uÞ1 ,
j ¼ 1,2, . . . ,n:
Consequently, the probability generating function of Tk is given by 0 1 0 1 k k k Y Y Y 1 1 A Tk k @ A @ Eðu Þ ¼ pj uð1qj uÞ ¼ pj u ð1qj uÞ , j¼1
j¼1
j¼1
which, by (2.17), implies (4.1). Further, the ascending factorial moment generating function of Tk, E½ð1uÞTk ¼
k Y
ð1aj uÞ1 ,
j¼1
expanded into powers of u, using again (2.17), implies (4.2). The mean and variance of Tk, using (4.2), for j =1,2, and (2.19), are readily deduced as (4.3). Equivalently, using the expression of Tk as a sum of k independent geometric random variables, these expressions are directly obtained. & The probability function and factorial moments of the number Xn of successes up to the nth Bernoulli trial, since P(Xn = x) =P(Tx + 1 = n + 1)/px + 1, is readily deduced from (4.1). Thus, the following theorem is obtained. Theorem 4.2. The probability function of the number Xn of successes in n trials is given by 0 1 x Y PðXn ¼ xÞ ¼ @ pj ASðn,x; qÞ, x ¼ 0,1, . . . ,n,
ð4:4Þ
j¼1
where qj = 1 pj, j =1,2,y,n and Sðn,x; qÞ is the generalized Stirling number of the second kind. The stochastic model of a sequence of Bernoulli trials with the probability of success at any trial depending on the number of previous successes, was introduced by Woodbury (1949). The probability function P(Xn =x), by solving the recurrence relation PðXn ¼ xÞ ¼ ð1qx1 ÞPðXn1 ¼ x1Þ þ qx PðXn1 ¼ xÞ,
x ¼ 1,2, . . . ,n,
with P(X0 = 0) = 1 and P(Xn =x)= 0, for x 4 n, was expressed in terms of the xth divided differences of tn at the points qj, j = 1,2,y,n. It can be shown, using (2.16), that the generalized Stirling numbers of the second kind Sðn,x; qÞ are given by this expression (see Charalambides, 2005b, p. 142). It should be noted that no explicit expression for the moments of Xn is known.
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2367
An application of these generalized binomial and negative binomial distributions in a reliability testing model, discussed by Sen and Balakrishnan (1999), is presented in the following example. Example 4.1 (An inverse sampling scheme in reliability). Testing of expensive and sophisticated systems often pose challenging evaluation dilemmas, especially when their effectiveness is best demonstrated by destructive experiments with two possible outcomes, success or failure. During the development of such a system, a test–redesign–retest cycle is often pursued in a sequence of stages. Precisely, the testing at the jth stage is performed on a batch of units and continues until a failed unit is observed. Then, the testing at the following stage is not resumed until the remaining units have undergone a fix for the failure modes identified at the previous stage. As a result of the corrective action the probability of a unit failure pj decreases with the number j of stages, j =1,2,y . Considering the testing of a unit as a Bernoulli trial with success the failure of the unit, the testing of a batch of units at each stage constitutes a geometric sequence of trials. Further, the sequence of stages forms a sequence of independent geometric sequences of trials, with probability of success pj, j =1,2,y . Consequently, the probability function and factorial moments of the number Wk of trials until the occurrence of the kth success (kth failed unit) are given by (4.1) and (4.2). Also, the probability function of the number Xn of successes (failed units, stages) in n trials is given by (4.4). Sen and Balakrishnan (1999) obtained by induction on k, the probability function of Tk; their expression is merely in terms of the generalized Stirling numbers of the second kind. Also, they expressed the probability function of Xn as PðXn ¼ xÞ ¼ PðTx ¼ nÞ þ PðTx þ 1 ¼ n þ1Þð1px þ 1 Þ=px þ 1 , which in view of the triangular recurrence relation of the generalized Stirling numbers of the second kind, Sðn,x; qÞ ¼ Sðn1,x1; qÞ þqx þ 1 Sðn1,x; qÞ, conforms with expression (4.4). The particular cases in which the probability of success at any trial varies geometrically with the number of previous trials are reviewed in this section. 4.1. Success probability geometrically increasing Consider first a sequence of independent geometric sequences of trials and assume that the probability of success at the jth geometric sequence of trials is given by pj ¼ 1yqj1 ,
j ¼ 1,2, . . . , 0 oq o1, 0 o y o 1,
which is a geometrically increasing sequence with rate q. Note that this is essentially the conditional probability of success at any Bernoulli trial given that j 1 successes occur in the previous trials. The probability function and factorial moments of the number of failures until the occurrence of a specific number of successes are given in the following corollary. Corollary 4.1. The probability function of the number Wk of failures until the occurrence of the kth success is given by k Y k þ w1 PðWk ¼ wÞ ¼ yw ð1yqj1 Þ, w ¼ 0,1, . . . , w q j¼1 for 0 o q o1 and 0 o y o1. Its factorial moments are given by 1 X k þm1 ym ðq1 1Þmj jsq ðm,jÞj , j ¼ 1,2, . . . , E½ðWk Þj ¼ j! Qm k þ i1 Þ m i ¼ 1 ð1yq q m¼j
ð4:5Þ
ð4:6Þ
where jsq ðm,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by EðWk Þ ¼
k X yqj1 , 1 yqj1 j¼1
VðWk Þ ¼
k X
yqj1
j1 Þ2 j ¼ 1 ð1yq
:
Proof. The probability function (4.5) is readily deduced from (4.1) in conjunction with (3.5), by setting qj ¼ yqj1 , j= 1,2,y, and using (2.23). Also, the mth q-factorial moment of Wk, on using (2.4), is obtained as m
½kþ m1m,q y , Eð½Wk m,q Þ ¼ Qm k þ j1 Þ j ¼ 1 ð1yq
k ¼ 1,2, . . . :
Then, applying (3.6), the factorial moments (4.6) are deduced. The mean and variance of Wk may be deduced from (4.6), using (2.12), or directly from (4.3) in conjunction with (3.5). & The probability function and factorial moments of the number of failures in a given number of trials are derived in the following corollary.
ARTICLE IN PRESS 2368
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
Corollary 4.2. The probability function of the number Yn of failures in n trials is given by " # ny Y n PðYn ¼ yÞ ¼ yy ð1yqj1 Þ, y ¼ 0,1, . . . ,n: y
ð4:7Þ
j¼1
q
for 0 oq o 1 and 0 o y o1. Its factorial moments are given by n X n E½ðYn Þj ¼ j! ym ðq1 1Þmj jsq ðm,jÞj, j ¼ 1,2, . . . ,n, m q m¼j
ð4:8Þ
and E[(Xn)j]=0, for j =n + 1,n + 2,y, where jsq ðm,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by EðYn Þ ¼
m n X ½nm,q ð1qÞm1 y ½mq m¼1
and m n X ½nm,q ð1qÞm2 y zm1,q ð1Þ þEðYn Þ½EðYn Þ2 , ½mq m¼2 P where zm1,q ð1Þ ¼ m1 k ¼ 1 1=½kq .
VðYn Þ ¼ 2
Proof. The probability function of the number Xn of successes in n trials is readily deduced from (4.4), by setting qj ¼ yqj1 , j = 1,2,y, and using (2.23), as x Y n PðXn ¼ xÞ ¼ ynx ð1yqj1 Þ, x ¼ 0,1, . . . ,n: x q j¼1 for 0 o q o1 and 0 o y o1. Since P(Yn = y)= P(Xn = n y) and " # " # n n ¼ , ny y q
q
Eq. (4.7) is established. The mth factorial moment of Yn is easily obtained as m
Eð½Yn m,q Þ ¼ ½nm,q y
and so applying (3.6), the factorial moments (4.8) are deduced. The mean and variance of Xn are deduced from (4.8), using (2.12). & Charalambides (2010) studied the q-distributions (4.5) and (4.7) in connection with the q-Bernstein polynomials. The probability function (4.7) was also obtained by Il’inskii (2004). Further, Il’inskii and Ostrovska (2002) and Ostrovska (2003), in their probabilistic based study of the convergence of the q-Bernstein polynomials, used the q-binomial distributions (4.7) and (4.11), together with the Euler distribution (6.2). An interesting application of the q-binomial distribution (4.7) to q-boson theory in physics was discussed by Jing and Fan (1993) and Jing (1994). Specifically, in order to construct a q-binomial state, they introduced the q-binomial distribution (4.7) as a q-deformed binomial distribution. 4.2. Success probability geometrically decreasing Consider a sequence of independent geometric sequences of trials and assume that the probability of success at the jth geometric sequence of trials is given by pj ¼ 1qmj þ 1 ,
j ¼ 1,2, . . . ,m, m positive integer, 0 o q o1,
which is a geometrically decreasing sequence of a finite number of terms. The probability functions and factorial moments of the number of trials until the occurrence of a given number of successes and the number of successes in a given number of trials are obtained in the following corollaries. Corollary 4.3. The probability function of the number Tk of trials until the occurrence of the kth success, k rm, is given by n1 qðnkÞðmk þ 1Þ ð1qÞk ½mk,q , n ¼ k,kþ 1, . . . , ð4:9Þ PðTk ¼ nÞ ¼ k1 q for 0 oq o 1 and m a positive integer. Its factorial moments are given by 1 X k þi1 qðijÞ jsq ði,jÞj qiðmk þ 1Þ , E½ðTk kÞj ¼ j! j i ½m þ ii,q ð1qÞ q i¼j
ð4:10Þ
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2369
for j = 1,2,y, where jsq ði,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by hm,q ð1Þhmk,q ð1Þ hm,q ð2Þhmk,q ð2Þ , , VðTk Þ ¼ 1q ð1qÞ2 P i j where hm,q ðiÞ ¼ m j ¼ 1 q =½jq , 0 o qo 1, i Z 1, m= 1,2,y, is the incomplete q-zeta function. EðTk Þ ¼ kþ
Proof. The probability function of Tk is obtained from (4.1), by setting qj =qm j + 1 = qm(q n1 PðTk ¼ nÞ ¼ qðnkÞm ð1qÞk ½mk,q , n ¼ k,kþ 1, . . . k1 q1
1 j1
)
and using (2.23), as
and since (see Charalambides, 2002, p. 405) n1 n1 ¼ qðnkÞðk1Þ , k1 q1 k1 q expression (4.9) is deduced. The ith q-factorial moment of Tk k is " # 1 1 X X n1 n1 k k ðnkÞðmk þ 1Þ iðmk þ 1Þ Eð½Tk ki,q Þ ¼ ½nki,q q ð1qÞ ½mk,q ¼ ½k þ i1i,q q ð1qÞ ½mk,q qðnkiÞðmk þ 1Þ k þ i1 k1 q n ¼ kþi n ¼ kþi q
and since, by (2.4), " # 1 X n1 qðnkiÞðmk þ 1Þ ¼ ð1qÞk þ i ½m þik þ i,q , k þi1 n ¼ kþi q
it reduces to Eð½Tk ki,q ¼ ½kþ i1i,q
qiðmk þ 1Þ ð1qÞi ½m þ ii,q
,
i ¼ 1,2, . . . :
Thus, using (3.6), the required expression (4.10) is deduced. The mean and variance of Tk may be deduced from (4.10), using (2.12), or directly from (4.3). & Corollary 4.4. The probability function of the number Xn of successes in n trials is given by n PðXn ¼ xÞ ¼ qðnxÞðmxÞ ð1qÞx ½mx,q , x ¼ 0,1, . . . ,n, x q for 0 o q o1 and m a positive integer. Its factorial moments are given by n X n E½ðXn Þj ¼ j! ½mi,q ð1qÞ2ij qðijÞ jsq ði,jÞj, j ¼ 1,2, . . . ,n, i q i¼j
ð4:11Þ
ð4:12Þ
and E[(Xn)j] =0, for j = n+ 1,n+ 2,y, where jsq ði,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by EðXn Þ ¼
n X ½ni,q ½mi,q ð1qÞ2i1
½iq
i¼1
where zi1,q ð1Þ ¼
Pi1
k¼1
,
VðXn Þ ¼ 2
n X ½ni,q ½mi,q ð1qÞ2i1 zi1,q ð1Þ i¼2
½iq
þ EðXn Þ½EðXn Þ2 ,
1=½kq .
Proof. The probability function of Xn is obtained from (4.4), by setting qj =qm j + 1 = qm(q 1)j 1 and using (2.23), as n PðXn ¼ xÞ ¼ qðnxÞm ð1qÞx ½mx,q , x ¼ 0,1, . . . ,n, x q1 and since n x
¼ qðnxÞx q1
n x
,
q
expression (4.11) is deduced. The ith q-factorial moment of Xn is n n X X n ni ½xi,q qðnxÞðmxÞ ð1qÞx ½mx,q ¼ ½ni,q ½mi,q ð1qÞi qðnxÞðmxÞ ð1qÞxi ½mixi,q Eð½Xn i,q Þ ¼ x q xi q x¼i x¼i and since n X ni x¼i
xi
q
q½ðniÞðxiÞ½ðmiÞðxiÞ ð1qÞxi ½mixi,q ¼ 1,
ARTICLE IN PRESS 2370
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
it reduces to Eð½Xn i,q ¼ ½ni,q ½mi,q ð1qÞi ,
i ¼ 1,2, . . . ,n:
Thus, using (3.6), the required expression (4.12) is deduced. The mean and variance of Xn are deduced from (4.12), using (2.12). & A variety of applications of the distributions (4.9) and (4.11) have been discussed by several authors. The following example is due to Dunkl (1981). Example 4.2 (Proofreading a manuscript). Suppose that a proofreader reads a manuscript, which has a fixed number m of errors, and when he finds an error corrects it, and starts reading the manuscript from the beginning. Also the proofreader, starts reading the manuscript from the beginning when he reaches its end. A scan (reading) of the manuscript is successful if the proofreader finds (and corrects) an error and is a failure otherwise. Thus, a scan of the manuscript constitutes a Bernoulli trial. Assume that the probability of finding any particular error is p =1 q. Then the probability that a scan (trial) is successful given that j 1 scans (trials) were successful is pj ¼ 1qmj þ 1 ,
j ¼ 1,2, . . . :
Consequently, the probability function and factorial moments of the number Xn of successful scans in n scans are given by (4.11) and (4.12), respectively. Also, the probability function and factorial moments of the number Tk scans until the occurrence of the kth successful scan are given by (4.9) and (4.10), respectively. Blomqvist (1952) considered the situation of n people crossing a minefield containing m mines, and derived the probability function of the number Xn of people failing to cross the minefield. Borenius (1953), setting qn ffi enð1qÞ , obtained the following approximate expressions for the mean and variance of Xn: EðXn Þ ffið1qn Þð1qm Þ,
VðXn Þ ffi
ð1qn Þð1qm Þð1qÞqn þ m ½1ð1qn Þð1qm Þ2
:
Kemp (1998) considered the sequential capture of animals from a closed population of m endangered animals. When an animal is found, it is transferred to captive breeding program and search is abandoned for that day. The probability functions and the first two moments of the number Xn of animals found in n days and the number Tk of search days until k animals are found were obtained using probability generating functions. Newby (1999) suggested a shift operator technique that allows the derivation of recursions for the expected value of a function and deduced the first two factorial moments. An interesting extension of these models, studied by Rawlings (1997), is presented in the following example. Example 4.3 (An absorption process). Suppose that batches of r particles are sequentially propelled into a chamber of l consecutive lined cells, with the capacity of each cell limited to one particle. Initially, a batch of r particles occupies the r leftmost cells. Then, a coin, with probability p of heads and q=1 p of tails, is successively tossed. When a tail occurs each of the r particles of the batch moves one cell to the right, while when a head occurs the batch of the r particles is absorbed and the cells with the absorbed particles are removed. A batch of r particles, which successfully reaches the r rightmost cells is said to have escaped and its particles are removed from the chamber without removing these cells. Subsequent batches of r particles are propelled into the chamber of the remaining cells. Clearly, the conditional probability of an absorption of a batch of r particles, given that j 1 absorptions occur, is given by pj ¼ 1qlrj þ 1 ,
j ¼ 1,2, . . . :
Setting, l = (m +1)r 1, with m 40 not necessarily an integer, it follows that pj ¼ 1qrðmj þ 1Þ ,
j ¼ 1,2, . . .
and the probability function of the number Xn of absorbed batches of r particles when n batches are propelled into the chamber of l cells, was combinatorially derived by Rawlings (1997) in the form (4.11), with qr instead of q. Similarly, the probability function of the number Tk of batches of r particles required to achieve k absorptions, was obtained in the form (4.9), with qr instead of q. Notice that these results implicitly show that the integer restriction on m, for (4.9) and (4.11), can be relaxed. Another nice extension to a direction different to that of Rawlings’, which was examined by Zacks and Goldfard (1966) and Barakat (1985), is discussed in the following example. Example 4.4 (Crossing a field with a random number of absorption points). Consider a fixed number n of particles that are required to cross a field containing a random number M of absorption points (traps) acting independently. If a particle clashes (contacts) with any of the absorption points, it is absorbed (trapped) with probability p = 1 q. An absorption point (trap) is ruined when it absorbs (traps) a particle. The sequential crossings of the field by the n particles constitute a sequence of independent Bernoulli trials. Clearly, given M= m, the conditional probability of an absorption of a particle at
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2371
any trial, given that j 1 absorptions occur, is pj ¼ 1qmj þ 1 ,
j ¼ 1,2, . . . ,m:
Consequently, by (4.11), the conditional probability function of the number Xn of absorbed particles, given that M= m, is n PðXn ¼ xjM ¼ mÞ ¼ qðnxÞðmxÞ ð1qÞx ½mx,q , x ¼ 0,1, . . . ,n, 0 o qo 1: x q Further, Zacks and Goldfard (1966), assuming that M has the Poisson distribution with EðMÞ ¼ l, obtained the distribution of Xn, essentially in the form " # X x x n j ðj þ2 1Þxðnx þ jÞ ð1Þ q exp½lð1qnx þ j Þ, PðXn ¼ xÞ ¼ j x qj¼0 q
for x = 0,1,y,n, with 0 o q o1 and 0 o l o1. It is interesting to determine the distribution of Xn when the number M of absorption points (traps) has a q-Poisson distribution. (a) Let M has the Heine distribution with probability function (6.1). Then, from PðXn ¼ xÞ ¼
1 X
PðXn ¼ xjM ¼ mÞPðM ¼ mÞ,
x ¼ 0,1, . . . ,n,
m¼x
it follows that PðXn ¼ xÞ ¼
n x
x
qð2Þ ½lð1qÞx eq ðlÞ
q
mx 1 X qð 2 Þ ðlqn Þmx ½mxq ! m¼x
and since eq ðlÞ
mx 1 X Eq ðlqn Þ qð 2 Þ ðlqn Þmx 1 , ¼ ¼ Qn i1 Þ ½mxq ! Eq ðlÞ i ¼ 1 ð1 þ lð1qÞq m¼x
it reduces to PðXn ¼ xÞ ¼
x n yx qð2Þ , Qn x q i ¼ 1 ð1 þ yqi1 Þ
x ¼ 0,1, . . . ,n,
with y ¼ lð1qÞ, which is the probability function (3.7) of the q-binomial distribution I. (b) Let M has the Euler distribution with probability function (6.2). Then 1 X n ðlqnx Þmx PðXn ¼ xÞ ¼ ½lð1qÞx Eq ðlÞ ½mxq ! x q m¼x and since Eq ðlÞ
1 n x Y X eq ðlqnx Þ ðlqnx Þmx ¼ ¼ ð1lð1qÞqi1 Þ, ½mxq ! eq ðlÞ m¼x i¼1
it reduces to PðXn ¼ xÞ ¼
n x
q
yx
n x Y
ð1yqi1 Þ,
x ¼ 0,1, . . . ,n,
i¼1
with y ¼ lð1qÞ, which is the probability function (4.7) of the q-binomial distribution II. Consider now a sequence of independent geometric sequences of trials and assume probability of success at the jth geometric sequence of trials is given by pj ¼ qr þ j1 ,
j ¼ 1,2, . . . , 0 oq o 1, 0 rr o 1,
which is a geometrically decreasing sequence with rate (proportion) q. Dubman and Sherman (1969) considered this stochastic model as a reliability growth model, with success the failure of a device, and estimated the parameters q and y ¼ qr . The probability function and factorial moments of the number Tk of trials until the occurrence of the kth success are readily deduced from (4.1) and (4.2) by setting qj = 1 qr + j 1 = (1 q)[r +j 1]q, aj = (q 1)r + j 1, for j= 1,2,y, and using (2.25) and (2.23), respectively. Then, the following corollary is deduced. Corollary 4.5. The probability function of the number Tk of trials until the occurrence of the kth success is given by k
PðTk ¼ nÞ ¼ qð2Þ þ rk ð1qÞnk Sq ðn1,k1; rÞ,
n ¼ k,k þ 1, . . . ,
ð4:13Þ
ARTICLE IN PRESS 2372
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
with 0 oq o1 and 0 rr o 1, where Sq(n,k;r) is the noncentral q-Stirling number of the second kind. Its factorial moments are given by " # k þj1 jðr þ k1Þ , j ¼ 1,2, . . . : ð4:14Þ E½ðTk þ j1Þj ¼ j!q j q
In particular, its mean and variance are given by EðTk Þ ¼ qðr þ k1Þ ½kq ,
VðTk Þ ¼ q2ðr þ k1Þ ½kq2 qðr þ k1Þ ½kq :
The probability function and factorial moments of the number of successes in a given number of trials are derived in the following corollary. Corollary 4.6. The probability function of the number Xn of successes in n trials is given by x
PðXn ¼ xÞ ¼ qð2Þ þ rx ð1qÞnx Sq ðn,x; rÞ,
x ¼ 0,1, . . . ,n,
ð4:15Þ
with 0 oq o 1 and 0 r r o1, where Sq(n,x;r) is the noncentral q-Stirling number of the second kind. Its factorial moments are given by n X n m qð 2 Þ þ rm jsq1 ðm,jÞjðq1Þmj , j ¼ 1,2, . . . ,n, ð4:16Þ E½ðXn Þj ¼ j! m m¼j and E[(Xn)j]=0, for j =n + 1,n + 2,y, where jsq ðm,jÞj is the signless q-Stirling number of the first kind. Proof. The probability function (4.15) is readily deduced from (4.4) by setting qj = 1 qr + j 1 =(1 q)[r + j 1]q, j = 1,2,y, and using (2.25), with y ¼ 1q. Introducing the following explicit expression of the noncentral generalized q-Stirling numbers of the second kind (see Charalambides, 2004) ! n X n j 1 jx rðjxÞ ð1Þ q , Sq ðn,x; rÞ ¼ j ð1qÞnx j ¼ x x q Eq. (4.15) may be written as ð2x Þ
PðXn ¼ xÞ ¼ q
n X
ð1Þ
n
jx rj
q
! j
j
j¼x
x
,
x ¼ 0,1, . . . ,n:
q
In order to find the factorial moments of Xn, it is more convenient to calculate first the q-factorial moments Eð½Xn m,q1 Þ, m =1,2,y . So, using the last expression of the probability function and interchanging the order of summation, ! j n X n X x j x jm rj Eð½Xn m,q1 Þ ¼ ½mq1 ! ð1Þ q ð1Þxm qð2Þ j x¼m m x 1 q q j¼m ! j n X X n x j m xm ¼ ½mq1 ! ð1Þjm qð 2 Þ þ rj ð1Þxm qð 2 Þ : j x¼m m x q q j¼m Since, by (2.3), j X
ð1Þxm qð
x¼m
xm 2 Þ
x m
j q
x
¼ q
j m
X jm
i
ð1Þi qð2Þ
jm
qi¼0
i
¼ dj,m ,
q
it follows that m
Eð½Xn m,q1 Þ ¼ ½mq1 !qð 2 Þ þ rm
n m
,
and so, applying (3.6) with q 1 instead of q, expression (4.16) is obtained.
ð4:17Þ &
The distribution (4.15) plays a central role in many algorithmic analyses. A probabilistic (approximate) counting algorithm, examined in detail by Flajolet (1985), is presented in the following example. Example 4.5 (A probabilistic algorithm for counting events in a small counter). An n-bit register can ordinarily be used to count up to 2n 1 events. If the requirement of accuracy is dropped, the following probabilistic (approximate) counting algorithm was proposed. If Cn is the number of events counted after n trials (occurrences of events), the approximate counting starts with the initial value C1 = 1. At each trial, the occurrence of an event is counted with probability PðCn þ 1 ¼ j þ1jCn ¼ jÞ ¼ qj ,
j ¼ 1,2, . . . , n ¼ 1,2, . . . ,
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2373
where q= 1/a, with a the base in the increment procedure of the algorithm. Consequently, Cn = Xn + 1 and according to (4.15), k1 2 Þ þ k1
PðCn ¼ kÞ ¼ qð
k
ð1qÞnk þ 1 Sq ðn,k1; 1Þ ¼ qð2Þ ð1qÞnk þ 1 Sq ðn þ 1,kÞ,
k ¼ 1,2, . . . ,n þ 1:
Also, from (4.17) it follows that Eðq1 ½Cn 1q1 Þ ¼ q1 Eð½Xn q1 Þ ¼ n and so n^ ¼ a½Cn 1a is an unbiased estimator of n, which was the objective for choosing the probabilities of counting or not counting an event. Crippa et al. (1997) obtained the probability function of the size (width) of the chain decomposition of an acyclic random graph Gn,p of n notes in the form (4.15). The probability functions (4.13) and (4.15) and the factorial moments (4.14) were deduced as a corollary of the corresponding expressions under a more general model in Charalambides (2004, 2005a). 5. Success probability varying with the number of trials and the number of successes Consider a sequence of independent Bernoulli trials and assume that the probability of success at the ith trial, given that j 1 successes occur in the i 1 previous trials, is given by Pi,j ðfsgÞ ¼ pi,j ¼
1bj , 1 þai
i ¼ 1,2, . . . , j ¼ 1,2, . . . ,
where ai Z0, i= 1,2,y, and 0 rbj r 1, j = 1,2,y . The probability function of the number Xn of successes up to the nth trial is obtained in the following theorem. Theorem 5.1. The probability function of the number Xn of successes in n trials is given by Qx j ¼ 1 ð1bj Þ PðXn ¼ xÞ ¼ Qn Cðn,x; a,bÞ, x ¼ 0,1, . . . ,n, i ¼ 1 ð1 þ ai Þ
ð5:1Þ
where Cðn,x; a,bÞ is the generalized Lah number. Proof. The probability function of Xn satisfies the recurrence relation PðXn ¼ xÞ ¼
an þbx þ 1 1bx PðXn1 ¼ xÞ þ PðXn1 ¼ x1Þ, 1 þan 1 þan
for x= 1,2,y,n, n = 1,2,y, with initial conditions Qn ðb1 þai Þ , n 4 0, PðX0 ¼ 0Þ ¼ 1, PðXn ¼ 0Þ ¼ Qin¼ 1 i ¼ 1 ð1þ ai Þ Clearly, the sequence Qn ð1þ ai Þ cn,x ¼ Qix¼ 1 PðXn ¼ xÞ, j ¼ 1 ð1bj Þ
PðX0 ¼ xÞ ¼ 0, x 4 0:
x ¼ 0,1, . . . ,n, n ¼ 0,1, . . . ,
satisfies the recurrence relation cn,x ¼ ðan þ bx þ 1 Þcn1,x þ cn1,x1 ,
x ¼ 1,2, . . . ,n, n ¼ 1,2, . . . ,
with initial conditions c0,0 ¼ 1,
cn,0 ¼
n Y
ðb1 þ ai Þ, n 40,
c0,x ¼ 0, x 40:
i¼1
Multiplying both members of the recurrence relation by cn ðtÞ ¼
n X
cn,x
x¼1
x Y
ðtbj Þ,
Qx
j ¼ 1 ðtbj Þ
and then summing it for x =1,2,y,n, we deduce for
n ¼ 1,2, . . . ,
j¼1
the recurrence relation cn ðtÞ ¼ ðt þ an Þcn1 ðtÞ, n ¼ 1,2, . . . , c0 ðtÞ ¼ 1, Qn i ¼ 1 ðt þai Þ. Therefore, by (2.20), cn,x ¼ Cðn,x; a,bÞ and since
which implies cn ðtÞ ¼ Qx
j ¼ 1 ð1bj Þ
PðXn ¼ xÞ ¼ Qn
i ¼ 1 ð1 þ ai Þ
cn,x ,
expression (5.1) is established.
&
x ¼ 0,1, . . . ,n,
ARTICLE IN PRESS 2374
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
The probability function of the number of trials Tk until the occurrence of the kth success is readily deduced from (5.1), by using the relation P(Tk =n)= P(Xn 1 = k 1)pn,k. Thus, the following theorem is obtained. Theorem 5.2. The probability function of the number Tk of trials until the occurrence of the kth success is given by Qk j ¼ 1 ð1bj Þ PðTk ¼ nÞ ¼ Qn Cðn1,k1; a,bÞ, n ¼ k,kþ 1, . . . , i ¼ 1 ð1 þ ai Þ
ð5:2Þ
where Cðn,k; a,bÞ is the generalized Lah number. Remark 5.1 (Mixture representations). The family of distributions (5.1), using (2.21), may be represented as a mixture distribution, n X
PðXn ¼ kÞ ¼
PðYn ¼ mÞPðZm ¼ kÞ,
k ¼ 0,1, . . . ,n ðn ¼ 1,2, . . .Þ
m¼k
with mixed distribution belonging in the family of distributions (4.4), k Y
PðZm ¼ kÞ ¼
ð1bj ÞSðm,k; bÞ,
k ¼ 0,1, . . . ,m ðm ¼ 1,2, . . .Þ
j¼1
and mixing distribution belonging in the family of distributions (3.1), jsðn,m; aÞj PðYn ¼ mÞ ¼ Qn , i ¼ 1 ð1þ ai Þ
m ¼ 0,1, . . . ,n ðn ¼ 1,2, . . .Þ:
Similarly, the family of distributions (5.2) may be represented as a mixture distribution, n X
PðTk ¼ nÞ ¼
PðUk ¼ mÞPðWm ¼ kÞ,
n ¼ k,kþ 1, . . . ðk ¼ 1,2, . . .Þ
m¼k
with mixed distribution belonging in the family of distributions (3.5), PðWm ¼ nÞ ¼
jsðn1,m1; aÞj , Qn i ¼ 1 ð1 þ ai Þ
n ¼ m,m þ1, . . . ðm ¼ 1,2, . . .Þ
and mixing distribution belonging in the family of distributions (4.1), PðUk ¼ mÞ ¼
k Y
ð1bj ÞSðm1,k1; bÞ,
m ¼ k,k þ1, . . . ðk ¼ 1,2, . . .Þ:
j¼1
The probability function of the number Xn of successes in n trials, (5.1), was given in Charalambides (2005b, p. 149). The particular cases in which the probability of success at a given trial varies geometrically in both the number of trials and the number of successes are reviewed in this section.
5.1. Success probability geometrically varying with the same rate Consider first a sequence of independent Bernoulli trials and assume that the probability of success at the ith trial, given that j 1 successes occur in the i 1 previous trials, is given by pi,j ¼ qr þ i þ j2 ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . , 0 o q o1, r 4 0,
which is a geometrically decreasing sequence in both the number of trials and the number of successes with the same rate q. The probability function and factorial moments of the number of successes in a given number of trials are obtained in the following corollary of Theorem 5.1. Corollary 5.1. The probability function of the number Xn of successes in n trials is given by n PðXn ¼ xÞ ¼ qxðx þ r1Þ ð1qÞnx ½n þr1nx,q , x ¼ 0,1, . . . ,n, x q for r 4 0 and 0 o q o1. Its factorial moments are given by n X n E½ðXn Þj ¼ j! jsq1 ðm,jÞjðq1Þmj qmðn þ r1Þ , j ¼ 1,2, . . . ,n, m q m¼j
ð5:3Þ
ð5:4Þ
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2375
and E[(Xn)j]=0, for j= n + 1, n+ 2,y, where jsq ðm,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by n X n m EðXn Þ ¼ ð1Þm1 qð 2 Þ þ rm ð1qÞm1 ½m1q !, m q m¼1 n X n m VðXn Þ ¼ 2 ð1Þm2 qð 2 Þ þ rm ð1qÞm2 ½m1q !hm1,q ð1Þ þ EðXn Þ½EðXn Þ2 , m q m¼2 P k j where hm,q ðkÞ ¼ m j ¼ 1 q =½jq , kZ 1, m =1,2,y, is the incomplete q-zeta function. Proof. The probability function (5.3) is readily deduced from (5.1) by setting ai = (1 q (r + i 1)), i= 1,2,y, bj = 1 qj 1, j =1,2,y,i, and using (2.26). Further, its factorial moments may be obtained by writing (5.3) as n qxðn þ r1Þ ð1qÞnx ½n þ r1nx,q , x ¼ 0,1, . . . ,n, PðXn ¼ xÞ ¼ x q1 and obtaining first the q-factorial moments n n X X x n nm Eð½Xn q1 Þ ¼ ½mq1 ! qxðn þ r1Þ ð1qÞnx ½n þ r1nx,q ¼ ½nm,q1 qxðn þ r1Þ ð1qÞnx ½n þ r1nx,q , x ¼ m m q1 x q1 x ¼ m xm q1 as Eð½Xn m,q1 Þ ¼ ½nm,q1 qmðm þ r1Þ ,
m ¼ 1,2, . . . ,n:
Thus, using (3.7), the required expression (5.4) is deduced. The mean and variance of Xn are obtained from (5.4), using (2.12). & It is also of interest to consider a sequence of independent Bernoulli trials and assume that the probability of success at the ith trial, given that j 1 successes occur in the i 1 previous trials, is given by pi,j ¼ qri þ j ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . ,½r, 0 o q o1, r 4 0:
This is a geometrically varying sequence with rate q, which is increasing with the number of trials and decreasing with the number of successes. The probability function and factorial moments of the number of successes in a given number of trials are obtained in the following corollary of Theorem 5.1. Corollary 5.2. The probability function of the number Xn of successes in n r ½r trials is given by n qxðr þ nxÞ ð1qÞnx ½rnx,q , x ¼ 0,1, . . . ,n, PðXn ¼ xÞ ¼ x q
ð5:5Þ
for r 4 0 and 0 o qo 1. Proof. The probability function (5.5) is readily deduced from (5.1) by setting ai = (1 q (r i + 1)), i=1,2,y,[r], bj = 1 qj 1, j =1,2,y,i, and using (2.27). & Remark 5.2 (Connection to the Blomqvist–Dunkl distribution). The probability function of the number Yn of failures in n r ½r trials, since P(Yn =y) =P(Xn = n y) is readily deduced from (5.5) as " # n PðYn ¼ yÞ ¼ qðnyÞðryÞ ð1qÞy ½ry,q , y ¼ 0,1, . . . ,n y q
for r 4 0 and 0 oq o1. Notice that for r = m, a positive integer, this is exactly the probability function (4.11) of the number of successes in n independent Bernoulli trials, with pj = 1 qm j + 1, j = 1,2,y,m, 0 o q o1, the probability of success at any trial given that j 1 successes occur in the previous trials. This is not a coincidence; it can be explained as follows. The assumption that the probability of success at the ith trial, given that j 1 successes occur in the i 1 previous trials, is given by Pi,j ðfsgÞ ¼ pi,j ¼ qri þ j ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . ,½r, 0 oq o1, r 4 0
is equivalent to the assumption that the probability of failure at the ith trial, given that i j failures occur in the i 1 previous trials, is given by Qi,ij ðff gÞ ¼ 1pi,j ¼ 1qri þ j ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . ,½r, 0 o q o1, r 4 0,
which, by replacing i j by j 1, is equivalent to the assumption that the probability of failure at the ith trial, given that j 1 failures occur in the i 1 previous trials, is given by Qi,j ðff gÞ ¼ 1qrj þ 1 ,
j ¼ 1,2, . . . ,½r, i ¼ 1,2, . . . , 0 o qo 1, r 40:
ARTICLE IN PRESS 2376
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
Interchanging the notions of success and failure and taking r ¼ m, a positive integer, this is exactly the assumption in the Blomqvist–Dunkl model. 5.2. Success probability geometrically varying with different rates Consider next the more general case of a sequence of independent Bernoulli trials and assume that the probability of success at the ith trial, given that j 1 successes occur in the i 1 previous trials, is given by pi,j ¼ qaði1Þ þ bðj1Þ þ c ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . , 0 o qo 1,
where the parameters a, b and c are such that 0 r pi,j r 1 for all j =1,2,y,i, i=1,2,y . This success probability varies geometrically in both the number of trials and the number of successes with rates qa and qb, respectively. This model was introduced and studied by Crippa and Simon (1997) in connection with a variety of probabilistic problems in the algorithmic graph theory. Charalambides (2004), inspired by the q-distributions obtained by Crippa and Simon, introduced and studied the noncentral generalized q-factorial coefficients and used them to express these distributions. The probability function and factorial moments of the number Xn of successes up to the nth trial are derived in the following corollary of Theorem 5.1. Corollary 5.3. The probability function of the number Xn of successes in n trials is given by n
x
PðXn ¼ xÞ ¼ qað2Þ þ bð2Þ þ cn
ð1qa Þn jCqa ðn,x; s,rÞj, ð1qb Þx
x ¼ 0,1, . . . ,n,
ð5:6Þ
where Cqa ðn,k; s,rÞ is the noncentral generalized q-factorial coefficient, with s= b/a and r = c/a. Its factorial moments are given by n X n m E½ðXn Þj ¼ j! qða þ bÞð 2 Þ þ cm jsqb ðm,jÞjðqb 1Þmj , j ¼ 1,2, . . . ,n, ð5:7Þ m qa m¼j and E[(Xn)j]=0, for j =n + 1, n +2,y, where jsq ðm,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by n X n m EðXn Þ ¼ qað 2 Þ þ cm ðqb 1Þm1 ½m1qa !, m qa m¼1 n X n m VðXn Þ ¼ 2 qað 2 Þ þ cm ðqb 1Þm2 ½m1qa !hm1,qb ð1Þ þEðXn Þ½EðXn Þ2 , m qa m¼2 P k j where hm,q ðkÞ ¼ m j ¼ 1 q =½jq , k Z1, m= 1,2,y, is the incomplete q-zeta function. Proof. The probability function (5.6) is readily deduced from (5.1) by setting ai = (1 q a(r + i 1)), i= 1,2,y, bj =1 qb (j 1), j = 1,2,y,i, and using (2.28), with q replaced by qa. Note that, introducing the following explicit expression of the noncentral generalized q-factorial coefficients (see Charalambides, 2004) " # n n j ð1qb Þx X n jx að2j Þ þ cj jCqa ðn,x; s,rÞj ¼ qað2Þcn ð1Þ q , n a j a x qb ð1q Þ j ¼ x q
Eq. (5.6) may be written as x
PðXn ¼ xÞ ¼ qbð2Þ
n X
j
ð1Þjx qað2Þ þ cj
j¼x
" # n j , j a x qb
x ¼ 0,1, . . . ,n:
ð5:8Þ
q
It should be noted that expression (5.8) was first derived by Crippa and Simon (1997). In order to find the factorial moments of Xn, it is more convenient to calculate first the q-factorial moments Eð½Xn m,qb Þ, m= 1,2,y . So, using (5.8), and interchanging the order of summation, " # j n X X n x j x jm að2j Þ þ cj ð1Þ q ð1Þxm qbð2Þ Eð½Xn m,qb Þ ¼ ½mqb ! j a x¼m m x b q qb j¼m q " # j n X X n x j j m xm ¼ ½mqb ! ð1Þjm qað2Þ þ bð 2 Þ þ cj ð1Þxm qbð 2 Þ : j a x¼m m x qb qb j¼m q
Since, by (2.3), j X x¼m
xm 2 Þ
ð1Þxm qbð
x m
j qb
x
¼ qb
j m
jm X
qb i ¼ 0
i
ð1Þi qbð2Þ
jm i
qb
¼ dj,m ,
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2377
it follows that m
Eð½Xn m,qb Þ ¼ ½mqb !qða þ bÞð 2 Þ þ cm
n m
qa
b
and so, applying (3.6) with q instead of q, expression (5.7) is obtained. In particular, for j= 1 and 2, (5.7), on using (2.12), yields the mean and variance of Xn. & Remark 5.3 (A related model). Consider a sequence of independent Bernoulli trials and assume that the probability of success at the ith trial, given that j 1 successes occur in the i 1 previous trials, is given by Pi,j ðfsgÞ ¼ pi,j ¼ 1qaði1Þ þ bðj1Þ þ c ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . , 0 o qo 1,
where the parameters a, b and c are such that 0 r pi,j r 1 for all j= 1,2,y,i, i=1,2,y . Let Xn be the number of successes in n trials and Yn = n Xn. Note that Pi,j ðff gÞ ¼ 1pi,j ¼ qaði1Þ þ bðj1Þ þ c ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . , 0 o qo 1,
is the probability of failure at the ith trial, given that j 1 successes occur in the i 1 previous trials or, equivalently, is the probability of failure at the ith trial, given that i j failures occur in the i 1 previous trials. Consequently, Qi,j ðff gÞ ¼ qi,j ¼ qða þ bÞði1Þbðj1Þ þ c ,
j ¼ 1,2, . . . ,i, i ¼ 1,2, . . . , 0 o q o1
is the probability of failure at the ith trial, given that j 1 failures occur in the i 1 previous trials. Thus, the probability function of the number Yn of failures in n trials, may be obtained from (5.1), by interchanging the notions of success and failure, as y
n
PðYn ¼ yÞ ¼ qða þ bÞð2Þbð2Þ þ cn
ð1qa þ b Þn jC ab ðn,y; s,rÞj, ð1qb Þx q
y ¼ 0,1, . . . ,n,
with s= b/(a+ b) and r =c/(a +b). The probability function of the number Xn of successes in n trials, since P(Xn = x)= P(Yn = n x), is deduced as n
nx 2 Þ þ cn
PðXn ¼ xÞ ¼ qða þ bÞð2Þbð
ð1qa þ b Þn jC ab ðn,nx; s,rÞj, ð1qb Þx q
x ¼ 0,1, . . . ,n:
Clearly, the factorial moments of Xn and Yn are connected by E[(Xn)j]= E[(n Yn)j], j =1,2,y. Crippa and Simon (1997) derived the probability function of Xn, via its probability generating function, and established its connection with the probability function of Yn, using the recurrence relations satisfied by these probability functions. The probability function of the number of trials Tk until the occurrence of the kth success is readily deduced from (5.2), by using (2.28). Thus, the following corollary is obtained. Corollary 5.4. The probability function of the number Tk of trials until the occurrence of the kth success is given by n
ð1qÞn1
k
PðTk ¼ nÞ ¼ qað2Þ þ bð2Þ þ rn
ð1qs Þk1
jCqa ðn1,k1; s,rÞj,
n ¼ k,k þ 1, . . . ,
ð5:9Þ
where Cqa ðn1,k1; s,rÞ is the noncentral generalized q-factorial coefficient, with s = b/a and r = c/a. 6. Limiting q-distributions The Heine and Euler distributions, which are q-Poisson distributions, are presented in this section as limiting distributions of the q-binomial and negative q-binomial distributions discussed in Sections 3 and 4. Also, the q-logarithmic distribution is examined as limiting distribution of a zero truncated negative q-binomial distribution. 6.1. q-Poisson distributions The expansions of the q-exponential functions (2.7) and (2.8) assure that x
PðX ¼ xÞ ¼ eq ðlÞ
x
qð2Þ l , ½xq !
x ¼ 0,1, . . . ,
ð6:1Þ
for 0 o qo 1, 0 o l o 1, and PðY ¼ yÞ ¼ Eq ðlÞ
ly ½yq !
,
y ¼ 0,1, . . . ,
ð6:2Þ
for 0 oq o 1, 0 o l o 1=ð1qÞ are legitimate probability functions. The corresponding distributions are known as Heine and Euler distributions, respectively. These distributions are q-Poisson distributions, since both probability functions, for q-1,
ARTICLE IN PRESS 2378
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
approaches the probability function of the Poisson distribution. Their factorial moments, obtained by Charalambides and Papadatos (2005), are given in the following theorem. Theorem 6.1. (a) The factorial moments of the Heine distribution are given by E½ðXÞj ¼ j!
1 X
m
qð 2 Þ
m¼j
lm ½mq !
ðq1 1Þmj jsq ðm,jÞj Qm , i1 Þ i ¼ 1 ð1 þ lð1qÞq
j ¼ 1,2, . . . ,
ð6:3Þ
where jsq ðm,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by EðXÞ ¼
1 X
m
qð 2 Þ
m¼1
½mq
lm ð1qÞm1 i1 Þ i ¼ 1 ð1 þ lð1qÞq
Qm
and VðXÞ ¼ 2
1 X
m
qð 2 Þ
m¼2
½mq
lm ð1qÞm2 þ EðXÞ½EðXÞ2 : i1 Þ i ¼ 1 ð1 þ lð1qÞq
Qm
(b) The factorial moments of the Euler distribution are given by E½ðXÞj ¼ j!
1 X lm 1 ðq 1Þmj jsq ðm,jÞj, ½m q! m¼j
ð6:4Þ
for j= 1,2,y . In particular, its mean and variance are given by lq ð1lð1qÞÞ , 1q P m where lq ð1tÞ ¼ 1 m ¼ 1 t =½mq is a q-logarithmic function, and EðXÞ ¼
1 X lm ð1qÞm zm1,q ð1Þ þ EðXÞ½EðXÞ2 , ½mq ð1qÞ m ¼ 2 P where zm1,q ð1Þ ¼ m1 j ¼ 1 1=½jq .
VðXÞ ¼
2
2
The probability function of the q-binomial distribution I, as the number of trials tends to infinity, can be approximated by the probability function of Heine distribution. Kemp and Newton (1990), derived the corresponding limiting expression via probability generating functions. Further, the probability function (3.11) of the negative q-binomial distribution I, as the number of failures tends to infinity, can be approximated by the probability function of Heine distribution. A direct derivation of these limits is given in the following theorem. Theorem 6.2. (a) The limit of the probability function (3.7) of the q-binomial distribution I, as n-1, is the probability function of the Heine distribution, x x x x n qð2Þ y qð2Þ l Qn lim , x ¼ 0,1, . . . , l ¼ y=ð1qÞ, ¼ eq ðlÞ ð6:5Þ i1 n-1 x ½xq ! Þ i ¼ 1 ð1 þ yq q Q i1 1 Þ is a q-exponential function. for 0 oq o 1, 0 o l o1, where eq ðlÞ ¼ 1 i ¼ 1 ð1lð1qÞq (b) The limit of the probability function (3.11) of the negative q-binomial distribution I, as k-1, is the probability function of the Heine distribution, " # y y y y kþ y1 qð2Þ y qð2Þ l ¼ eq ðlÞ , y ¼ 0,1, . . . , ð6:6Þ lim Q k þ y y ½yq ! k-1 ð1þ yqi1 Þ q
i¼1
for 0 oq o 1, 0 o l o1, with l ¼ y=ð1qÞ. Proof. (a) Since, for 0 oq o1, x Y n 1 1 lim ¼ ð1qni þ 1 Þ ¼ lim x n-1 x n-1 ½x ! ð1qÞ ð1qÞx ½xq ! q q i¼1 and lim
n-1
n Y
ð1þ lð1qÞqi1 Þ ¼ Eq ðlÞ ¼ 1=eq ðlÞ,
i¼1
the limiting expression (6.5) is readily deduced. (b) The limiting expression (6.6) is similarly obtained.
&
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2379
The probability functions of the negative q-binomial distribution II, as the number of successes tend to infinity, and the q-binomial distribution (4.7), as the number of trials tend to infinity, can be approximated by the Euler distribution, according to the following theorem. Its derivation can be carried out exactly as Theorem 6.2. Theorem 6.3. (a) The limit of the probability function (4.7) of the q-binomial distribution II, as n-1, is the probability function of the Euler distribution, " # ny Y n ly , y ¼ 0,1, . . . , l ¼ y=ð1qÞ, yy ð1yqj1 Þ ¼ Eq ðlÞ ð6:7Þ lim n-1 y ½yq ! j¼1 q
Q i1 Þ is a q-exponential function. for 0 o q o1, 0 o l o 1=ð1qÞ, where Eq ðlÞ ¼ 1 i ¼ 1 ð1 þ lð1qÞq (b) The limit of the probability function (4.5) of the negative q-binomial distribution II, as k-1, is the probability function of the Euler distribution, lim
kþ w1
k-1
w
yw
q
k Y
ð1yqj1 Þ ¼ Eq ðlÞ
j¼1
lw ½wq !
,
w ¼ 0,1, . . . ,
ð6:8Þ
for 0 o q o1, 0 o l o 1=ð1qÞ, with l ¼ y=ð1qÞ. An interesting application of the Heine and Euler distributions as feasible priors in a simple Bayesian model for oil exploration, which was given by Benkherouf and Bather (1988), is presented in the following example. Example 6.1 (Number of undiscovered oilfields in oil exploration). Suppose that an oil company has an area in which to drill and the area contains an unknown number of oilfields. The probability function P(X= x), x = 0,1,y, of the number X of undiscovered oilfields is required for finding optimal strategies for drilling. Suppose further that a single well can reach at most one of the undiscovered oilfields, so that it can be considered as a Bernoulli trial resulting in a success (S) or a failure (F). Let PðSjX ¼ jÞ ¼ pj and PðFjX ¼ jÞ ¼ qj ¼ 1pj , j =0,1,y, and assume that q0 ¼ 1 4 q1 4 q2 4 . Then by Bayes’ theorem PðX ¼ j þ1Þpj þ 1 , PðX ¼ j þ1jSÞ ¼ P1 i ¼ 0 PðX ¼ iÞpi
PðX ¼ jÞqj PðX ¼ jjFÞ ¼ P1 : i ¼ 0 PðX ¼ iÞqi
Also, the posterior distributions PðX ¼ j þ 1jðS,FÞÞ and PðX ¼ j þ1jðF,SÞÞ, given a success and a failure in either order, are deduced as PðX ¼ j þ 1Þpj þ 1 qj , P½X ¼ j þ 1jðS,FÞ ¼ P1 i ¼ 0 PðX ¼ iþ 1Þpi þ 1 qi
j ¼ 0,1, . . .
and PðX ¼ j þ1Þqj þ 1 pj þ 1 , P½X ¼ j þ 1jðF,SÞ ¼ P1 i ¼ 0 PðX ¼ iÞqi pi
j ¼ 0,1, . . . ,
respectively. Thus, the posterior distribution of the number X of undiscovered oilfields after a number of wells have been drilled depends on the order in which successes and failures occur. Imposing the condition P½X ¼ j þ 1jðS,FÞ ¼ P½X ¼ j þ 1jðF,SÞ,
j ¼ 0,1, . . . ,
under which the number of successes and failures is a sufficient statistic, it follows that qj ¼ qj ,
j ¼ 0,1, . . . ,0 oq o1:
The derivation of a specific distribution for the number X of undiscovered oilfields, requires an additional assumption. (a) Assume that PðX ¼ j þ1jSÞ ¼ PðX ¼ jjFÞ, for j= 0,1,y. Then PðX ¼ j þ1Þð1qj þ 1 Þ ¼ yPðX ¼ jÞqj , j ¼ 0,1, . . . , P P1 i i with y ¼ 1 i ¼ 0 PðX ¼ iÞð1q Þ= i ¼ 0 PðX ¼ iÞq . Consequently x
PðX ¼ xÞ ¼ PðX ¼ 0Þ
x
qð2Þ l , ½xq !
x ¼ 1,2, . . . ,
where l ¼ y=ð1qÞ, with y 4 0, and, by (2.7), PðX ¼ 0Þ ¼ 1=Eq ðlÞ ¼ eq ðlÞ, which is the probability function of the Heine distribution. (b) Suppose that PðX ¼ jÞ ¼ PðX ¼ j þ 1jSÞ, for j =0,1,y. Then PðX ¼ j þ1Þð1qj þ 1 Þ ¼ yPðX ¼ jÞ,
j ¼ 0,1, . . . ,
ARTICLE IN PRESS 2380
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
with y ¼
P1
i¼0
PðX ¼ iÞð1qi Þ. Consequently
PðX ¼ xÞ ¼ PðX ¼ 0Þ
lx ½xq !
,
x ¼ 1,2, . . . ,
where l ¼ y=ð1qÞ, with 0 o y o 1, and, by (2.8), PðX ¼ 0Þ ¼ 1=eq ðlÞ ¼ Eq ðlÞ, which is the probability function of the Euler distribution. Example 6.2 (A generalization of the Euler distribution). In the Bayesian model for oil exploration, presented in Example 6.1, Benkherouf and Alzaid (1993) generalized the Euler distribution, by replacing assumption (b) with the more general assumption PðX ¼ jÞ ¼ rPðX ¼ j þ1jSÞ þ ð1rÞPðX ¼ jjFÞ, for j =0,1,y, with 0 o r r 1, which implies the recurrence relation 1
PðX ¼ jÞ ¼ PðX ¼ j þ 1Þð1qj þ 1 Þry þ PðX ¼ jÞqj ð1rÞð1yÞ1 , P i for j =0,1,y, with y ¼ 1 i ¼ 0 PðX ¼ iÞð1q Þ o 1. Introducing the parameters
l ¼ y=r, r ¼
logð1rÞlogð1yÞ , log q
and iterating the resulting recurrence relation, it follows that PðX ¼ xÞ ¼ PðX ¼ 0Þ
r þ x1 ð1qr Þð1qr þ 1 Þ ð1qr þ x1 Þ x l ¼ PðX ¼ 0Þ lx , ð1qÞð1q2 Þ ð1qx Þ x q
where 0 oq o1, 0 o l o1 and r 40. Thus, by (2.2), r þx1 lx Q1 PðX ¼ xÞ ¼ , r þ i1 Þ=ð1lqi1 Þ x i ¼ 1 ð1lq q
x ¼ 0,1, . . . ,
x ¼ 0,1, . . . ,
ð6:9Þ
where 0 o q o1, 0 o l o 1 and r 4 0. For l ¼ y and r = k, a positive integer, it reduces to probability function (4.5) of the negative q-binomial distribution II. Note that the choice of the parameter r instead of the parameter a ¼ qr , chosen by Benkherouf and Alzaid (1993), reveals that the random variable X obeys a (general) negative q-binomial distribution. Kemp (1992a–c) obtained the Heine and Euler distributions as steady state distributions of Markov chains and discussed other properties and applications. Also, A. Kemp (1997) studied the distributions of the differences of two Heine random variables and two Euler random variables and expressed them in terms of modified q-Bessel functions. Further, Kemp (2002a) examined existence conditions and properties for the generalized Euler family of distributions; this family, in addition to the generalization of the Euler distribution, discussed in Example 6.2, includes a variety of q-distributions with probability functions of the same mathematical form (6.9) but usually with different parameter constraints. The conditional distribution of a Poisson random variable, given its sum with another Poisson random variable, independent of it, is a binomial distribution. Its q-analog is discussed in the following remark. Remark 6.1 (Rogers–Szeg¨ o and Stieltjes–Wigert distributions). Kemp (2002b), in addition to the q-binomial distributions (3.7) and (4.7), examined two other q-binomial distributions, via probability generating functions. Specifically, (a) the distribution with probability function n yx , x ¼ 0,1, . . . ,n, 0 oq o 1, 0 o y o 1, PðX ¼ xÞ ¼ x q hn ðyÞ P j o distribution. Clearly, this is the where hn ðyÞ ¼ nj¼ 0 ½nj q y is the Rogers–Szego¨ polynomial, was called Rogers–Szeg¨ conditional distribution of an Euler random variable, given its sum with another Euler random variable, independent of it. Further, this q-distribution may be considered as the stationary distribution of a birth and death process with birth and death rates
lj ¼ y½njq , j ¼ 0,1, . . . ,n, mj ¼ ½jq , j ¼ 1,2, . . . ,n: Also, (b) the distribution with probability function x n qxðx1Þ y PðX ¼ xÞ ¼ , x ¼ 0,1, . . . ,n, 0 o q o1, 0 o y o1, 3=2 s ð y q Þ x q n P where sn ðlÞ ¼ c nj¼ 0 ½nj q qjðj þ 1=2Þ ðlÞj , with c a normalizing constant, is the Stieltjes–Wigert polynomial (a limit of the q-Laguerre polynomial), was called Stieltjes–Wigert distribution. Clearly, this is the conditional distribution of a Heine random variable, given its sum with another Heine random variable, independent of it. Further, this q-distribution may be considered as the stationary distribution of a birth and death process with birth and death rates
lj ¼ yq2j ½njq , j ¼ 0,1, . . . ,n,
mj ¼ qn1 ½jq , j ¼ 1,2, . . . ,n:
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2381
The representation of the Rogers–Szego¨ distribution as the conditional distribution of an Euler random variable, given its sum with another Euler random variable, independent of it and the Stieltjes–Wigert distribution as conditional distributions of a Heine random variable, given its sum with another Heine random variable, independent of it, characterized these distributions. Similar characterizations of other discrete q-distributions are briefly indicated in the following remark. Remark 6.2 (Characterizations of discrete q-distributions). Kemp (2001, 2003) provided characterizations of several discrete q-distributions, via the conditional distribution of a random variable X, given its sum, X+ Y, with a random variable Y independent of it. For example, in Kemp (2001) it was shown that the conditional distribution of X, given that X+ Y =m, is the absorption distribution (4.11) if and only if X obeys the q-binomial distribution (3.7) and Y obeys the Heine distribution (6.1) with suitable parameters y and l. Also, in Kemp (2003) it was shown that the conditional distribution of X, given that X+ Y =n, is the q-binomial distribution (3.7) if and only if X obeys the Heine distribution (6.1) and Y obeys the Euler distribution (6.2). Other discrete q-distributions were similarly characterized. 6.2. q-Logarithmic distribution Consider the q-hypergeometric series Fq ð½aq ,½bq ; ½cq ; tÞ ¼
1 ½aðkÞ ½bðkÞ X q q
½cðkÞ q
k¼0
tk , ½kq !
jqjo 1,
jtj o1,
where ½xðkÞ q ¼ ½xq ½x þ1q ½x þ k1q , and let lq ð1tÞ ¼ tF q ð½1q ,½1q ; ½2q ; tÞ,
jqj o 1, jtjo 1:
Then lq ð1tÞ ¼
1 X tj , ½j j¼1 q
jqj o 1, jtj o1,
with lq(t) a q-logarithmic function. This function may also be written as a q-integral of 1/t. Clearly PðX ¼ xÞ ¼ ½lq ð1yÞ1
yx ½xq
,
x ¼ 1,2, . . . , 0 oq o 1, 0 o y o 1
ð6:10Þ
is a legitimate probability function. This distribution is called q-logarithmic distribution. Note that the limit of this distribution, for q-1, is the logarithmic distribution. C.D. Kemp (1997) introduced and studied the q-logarithmic distribution as a group size distribution. Specifically, it is the stationary distribution of a birth and death process with birth and death rates
li ¼ ½iq l, i ¼ 1,2, . . . , mi ¼ ½iq m, i ¼ 2,3, . . . ,
m1 ¼ 0,
and y ¼ l=m. The factorial moments of the q-logarithmic distribution are derived in the following theorem. Theorem 6.4. The factorial moments of the q-logarithmic distribution are given by E½ðXÞj ¼
1 m X Y j! ym 1 ðq 1Þmj jsq ðm,jÞj ð1yqi1 Þ1 , lq ð1yÞ m ¼ j ½mq i¼1
ð6:11Þ
for j = 1,2,y, where jsq ðm,jÞj is the signless q-Stirling number of the first kind. In particular, its mean and variance are given by EðXÞ ¼
1 m Y ð1qÞ1 X ½yð1qÞm ½m1q ! ð1yqi1 Þ1 lq ð1yÞ m ¼ 1 ½mq i¼1
VðXÞ ¼
1 m Y 2ð1qÞ2 X ½yð1qÞm ½m1q !zm1,q ð1Þ ð1yqi1 Þ1 þ EðXÞ½EðXÞ2 , lq ð1yÞ m ¼ 2 ½mq i¼1
and
where zm1,q ð1Þ ¼
Pm1 j¼1
1=½jq .
Proof. The mth q-factorial moment of the q-logarithmic distribution is expressed as " # m 1 1 1 X m þy1 ½m1q ! X x1 ½m1q !y X 1 yx Eð½Xm,q Þ ¼ ½xm,q ¼ yx ¼ yy y lq ð1yÞ x ¼ m m1 q ½xq lq ð1yÞ lq ð1yÞ x ¼ m y¼0
q
ARTICLE IN PRESS 2382
Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
and so, using (2.4), is obtained as Eð½Xm,q Þ ¼
m m ½m1q !y Y ð1yqi1 Þ1 , lq ð1yÞ i ¼ 1
m ¼ 1,2, . . . :
Introducing it into (3.6), the factorial moments of the q-logarithmic distribution are deduced as (6.11).
&
Consider the zero-truncated random variable Zk ¼ Wk jWk 40, where Wk obeys the negative q-binomial distribution II with probability function (4.5). The probability function of Z, PðZk ¼ zÞ ¼ PðWk ¼ zjWk 4 0Þ, z= 1,2,y, is readily obtained as " # !1 k Y k þz1 PðZk ¼ zÞ ¼ yz ð1yqi1 Þ1 1 , z ¼ 1,2, . . . , ð6:12Þ z i¼1 q
with 0 o qo 1 and 0 o y o1. This distribution, for k-0, can be approximated by the q-logarithmic distribution according to the following theorem. Theorem 6.5. The limit of the probability function (6.12) of the zero-truncated negative q-binomial distribution II, for k-0, is the q-logarithmic distribution, " # !1 k Y kþ z1 yz yz ð1yqi1 Þ1 1 ¼ ½lq ð1yÞ1 , ð6:13Þ lim ½zq k-0 z i¼1 q
for z= 1,2,y, with 0 oq o 1, 0 o y o1. Proof. The limit of the probability function (6.12), since " # " # kþ z1 1 k þz1 1 1 limk-0 ½k þ 1q ½k þ2q ½k þ z1q 1 lim ¼ ¼ lim ¼ n-0 ½kq ½z1q ! ½zq k-0 ½zq ½zq z z1 q
q
and 1 lim k-0 ½kq
k Y
! i1 1
ð1yq
Þ
1
i¼1
is readily obtained as (6.13).
" # 1 1 X k þ j1 j 1 X yj ¼ lim y ¼ ¼ lq ð1yÞ j ½j k-0 ½kq j¼0 j¼1 q
&
Kemp and Kemp (2009) studied a three-parameter generalization of the q-logarithmic distribution to which it reduces as the additional parameter tend to one. It is obtained as the cluster distribution for the generalized Euler distribution.
Acknowledgements The author is sincerely thankful to the referee for his valuable comments and suggestions towards revising this paper. This research was partially supported by the University of Athens Research Special Account under Grant 70/4/3406. References Balakrishnan, N., Nevzorov, V.B., 1997. Stirling numbers and records. In: Balakrishnan, N. (Ed.), Advances in Combinatorial Methods and Applications to ¨ Probability and Statistics. Birkhauser, Boston, pp. 189–200. Barakat, R., 1985. Probabilistic aspects of particles transiting a trapping field: an exact combinatorial solution in terms of Gauss polynomials. J. Appl. Math. Phys. 36, 422–432. Benkherouf, L., Alzaid, A.A., 1993. On the generalized Euler distribution. Statist. Probab. Lett. 18, 323–326. Benkherouf, L., Bather, J.A., 1988. Oil exploration: sequential decisions in the face of uncertainty. J. Appl. Probab. 25, 529–543. Blomqvist, N., 1952. On an exhaustion process. Skandinavisk Akktuarietidskrift 35, 201–210. Borenius, G., 1953. On the statistical distribution of mine explosions. Skandinavisk Akktuarietidskrift 36, 151–157. Charalambides, Ch.A., 2002. Enumerative Combinatorics. Chapman & Hall, CRC, Boca Raton, FL. Charalambides, Ch.A., 2004. Non-central generalized q-factorial coefficients and q-Stirling numbers. Discrete Math. 275, 67–85. Charalambides, Ch.A., 2005a. Moments of a class of discrete q-distributions. J. Statist. Plann. Inference 135, 64–76. Charalambides, Ch.A., 2005b. Combinatorial Methods in Discrete Distributions. John Wiley & Sons, Hoboken, NJ. Charalambides, Ch.A., 2007. Distributions of record statistics in a geometrically increasing population. J. Statist. Plann. Inference 137, 2214–2225. Charalambides, Ch.A., 2009. Distributions of record statistics in a q-factorially increasing population. Comm. Statist. Theory Methods 38, 1–14. Charalambides, Ch.A., 2010. The q-Bernstein basis as a q-binomial distribution. J. Statist. Plann. Inference 140, doi:10.1016/j.jspi.2010.01.014. Charalambides, Ch.A., Papadatos, N., 2005. The q-factorial moments of discrete q-distributions and a characterization of the Euler distribution. In: Balakrishnan, N., Bairamov, I.G., Gebizlioglu, O.L. (Eds.), Advances on Models, Characterizations and Applications. Chapman & Hall, CRC Press, Boca Raton, FL, pp. 57–71. Crippa, D., Simon, K., 1997. q-Distributions and Markov processes. Discrete Math. 170, 81–98. Crippa, D., Simon, K., Trunz, P., 1997. Markov processes involving q-Stirling numbers. Combin. Probab. Comput. 6, 165–178. Dubman, M., Sherman, B., 1969. Estimation of parameters in a transient Markov chain arising in a reliability growth model. Ann. Math. Statist. 40, 1542–1556. Dunkl, C.F., 1981. The absorption distribution and the q-binomial theorem. Comm. Statist. Theory Methods A 10, 1915–1920. Flajolet, P., 1985. Approximate counting: a detailed analysis. BIT 25, 113–134.
ARTICLE IN PRESS Ch.A. Charalambides / Journal of Statistical Planning and Inference 140 (2010) 2355–2383
2383
Il’inskii, A., 2004. A probabilistic approach to q-polynomial coefficients, Euler and Stirling numbers I. Matematicheskaya Fisika, Analiz. Geometriya 11, 434–448. Il’inskii, A., Ostrovska, S., 2002. Convergence of generalized Bernstein polynomials. J. Approx. Theory 116, 100–112. Jing, S.C., 1994. The q-deformed binomial distribution and its behaviour. J. Phys. A Math. Gen. 27, 493–499. Jing, S.C., Fan, H.Y., 1993. q-Deformed binomial state. Phys. Rev. A 49, 2277–2279. Kemp, A., 1987. A Poissonian binomial model with constrained parameters. Naval Res. Logistics 34, 853–858. Kemp, A., 1992a. Heine-Euler extensions of the Poisson distribution. Comm. Statist. Theory Methods 21, 791–798. Kemp, A., 1992b. On counts of organisms able to signal the presence of an observer. Biom. J. 34, 595–604. Kemp, A., 1992c. Steady-state Markov chain models for the Heine and Euler distributions. J. Appl. Probab. 29, 869–876. Kemp, A., 1997. On modified q-Bessel functions and their statistical applications. In: Balakrishnan, N. (Ed.), Advances in Combinatorial Methods and ¨ Applications to Probability and Statistics. Birkhauser, Boston, MA, pp. 451–463. Kemp, A., 1998. Absorption sampling and the absorption distribution. J. Appl. Probab. 35, 489–494. Kemp, A., 2001. A characterization of a distribution arising from absorption sampling. In: Charalambides, Ch.A., Koutras, M.V., Balakrishnan, N. (Eds.), Probability and Statistical Models with Applications. Chapman & Hall, CRC Press, Boca Raton, FL, pp. 239–246. Kemp, A., 2002a. Existence conditions and properties for the generalized Euler family of distributions. J. Statist. Plann. Inference 101, 169–178. Kemp, A., 2002b. Certain q-analogues of the binomial distribution. Sankhya¯ Ser. A 64, 293–305. Kemp, A., 2003. Characterizations involving UjðU þ V ¼ mÞ for certain discrete distributions. J. Statist. Plann. Inference 109, 31–41. Kemp, A., Kemp, C.D., 1991. Weldon’s dice data revisited. Amer. Statist. 45, 216–222. Kemp, A., Kemp, C.D., 2009. The q-cluster distribution. J. Statist. Plann. Inference 139, 1856–1866. Kemp, A., Newton, J., 1990. Certain state-dependent processes for dichotomised parasite populations. J. Appl. Probab. 27, 251–258. Kemp, C.D., 1997. A q-logarithmic distribution. In: Balakrishnan, N. (Ed.), Advances in Combinatorial Methods and Applications to Probability and ¨ Statistics. Birkhauser, Boston, MA, pp. 465–570. Nevzorov, V.B., 1984. Record times in the case of non-identically distributed random variables. Theory Probab. Appl. 29, 845–846. Newby, M., 1999. Moments and generating functions for the absorption distribution and its negative binomial analogue. Comm. Statist. Theory Methods 28, 2935–2945. Ostrovska, S., 2003. q-Bernstein polynomials and their iterates. J. Approx. Theory 123, 232–255. Platonov, M.L., 1976. Elementary applications of combinatorial numbers in probability theory. Theory Probab. Math. Statist. 11, 129–137. Poisson, S.D., 1837. Recherche´s sur la Probabilite´ des Jugements en Matie re Criminelle te en Matie re Civile, Pre ce de es des Regles Ge´ne´rales du Calcul des Probabilite´s, Bachelier, Imprimeur-Libraine pour des Mathematiques, la Physique, etc, Paris. Rawlings, D., 1997. Absorption processes: models for q-identities. Adv. Appl. Math. 18, 133–148. Sen, A., Balakrishnan, N., 1999. Convolution of geometrics and a reliability problem. Statist. Probab. Lett. 43, 421–426. Woodbury, M.A., 1949. On a probability distribution. Ann. Math. Statist. 20, 311–313. Yang, M.C.K., 1975. On the distributions of the inter-record times in an increasing population. J. Appl. Probab. 12, 148–154. Zacks, S., Goldfard, D., 1966. Survival probabilities in crossing a field containing absorption points. Naval Res. Logistics Quart. 13, 35–48.