Computational Statistics & Data Analysis 19 (1995) 293-307 North-Holland
293
Dirichlet analysis for inverse multinomial with both quota and quota-free cells M. Ebneshahrashoob Department of Mathematics, California State University, Long Beach, Long Beach, USA
Milton Sobel Department of Statistics and Applied Probability, University of California, Santa Barbara, Santa Barbara, USA Received February 1993 Revised December 1993
Abstract: For a broad variety of quota-sampling problems, the moment structure for the total frequency in various cells at stopping time is studied, as well as the cross moments, covariances, and correlations. Previously, we considered two different types of quotas: frequency vs. run quotas. In this paper, the effect of having quota-free cells present is considered. Our main strategy is to relate the cell frequencies at stopping time to the total number of observations needed to reach the stopping time. For the case of a frequency quota and 3' quota-free cells, we consider the problem of stopping as soon as any J of the a frequency quotas are satisfied. The main tool that we use is the Dirichlet integrals which are defined here, but the reader should consult [3] and [4] for further background. Another method, called the generalized probability generating function (gpgf), is also used for confirmation of the various results. The paper terminates with an example of how to use the results of this paper in a two-stage sampling (with a fixed sample size n in the first stage) to make the covariances between certain total cell frequencies equal to zero (i.e., uncorrelated) at stopping time.
Keywords: Dirichlet applications; Multinomial stopping problem; Quota-free cells; Generalized probability generating function.
I. Introduction
As in other recent papers [1,2], the background model is that of a multinomial distribution with a total of c cells, all having positive known cell probabilities. Correspondence to: M. Ebneshahrashoob, Ph.D., Department of Mathematics, California State University, Long Beach, CA 90840, USA. 0167-9473/95/$09.50 © 1995 - Elsevier Science B.V. All rights reserved
SSDI 0167- 9 4 7 3 ( 9 4 ) 0 0 0 0 4 - 3
294
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
Inverse sampling is used with a particular stopping time procedure which constitutes our strategy and is defined below. In our general model, there are three types of cells present (c = a +/3 + y), a cells with cell probabilities Pi (i = 1, 2 , . . . , a) and with positive frequency quotas fi (i = 1, 2 , . . . , a; respectively), a set o f / 3 cells (disjoint from the previous set) with cell probabilities p~. ( j = 1, 2 . . . . ,/3) and with positive run quotas rj ( j = 1, 2 . . . . . /3; respectively), and another set of 3' cells (disjoint from both previous sets) with cell probabilities qk (k = 1, 2 , . . . , "y), all of which are quota-free. Hence,
i=1
p, + E
j=l
+ E
k=l
= 1,
(1.1)
and we refer to this general model as Case ( a , / 3 ; y) or Case 3. In this m o r e general setting, our main emphasis is on the "soonest" problem, which means that we stop sampling as soon as any one of the a +/3 quotas is satisfied. In Section 2, t h e / 3 run quota cells are absent and we refer to this as Case 1. For Case 1, we also deal with the J t h - o r d e r e d quota, i.e., with waiting for any J of the a quotas to be satisfied; this is a "soonest" problem for J = 1 and a "latest" problem for J = a. We develop exact expressions for the m o m e n t s of the total frequencies at stopping time when only cells of types 1 and 3 are present. The results are different for the frequency quota (FQ) cells and the quota-free (QF) cells and we are also interested in cross moments as well as means, covariances, and correlations. In Section 3, we briefly describe the generalized probability generating function approach to the general model of Case 3 and then give exact results for Case 2 where the a frequency quota cells are absent. Again we use the "soonest" problem as our main example for discussion. In Section 4, we include a simple example to illustrate how our results can be used in a two-stage setting (in which the first stage has a fixed sample size n) so as to make the total (random) frequencies at stopping time uncorrelated. This further emphasizes that our results are quite different from those of a multinomial distribution.
2. Joint moment structure for Case 1: Cells of types 1 and 3 present All the results below for Case 1 are given in terms of Dirichlet integrals and we therefore introduce these integrals here. The Dirichlet (type 2) integrals (called C and D integrals) are defined, respectively, by
b F(m +br)
a
C~aO)(r, m ) = r b ( r ) r ( m ) fo "f[
1--Ixr-1 dxi b m+br,
i=1
(1 + i--~lXi)
(2.1)
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
295
and V ( m + br) D~b)(r, m ) =
F (r)F(m )
o~ ~ f~ " ' ' f a
b H xir-1 d x i i=1 m+br'
(2.2)
i=1
where a >~ 0, b is a non-negative integer and m, r are both positive; in both (2.1) and (2.2), the value for b = 0 is one for any triple (m, r, a). These integrals are useful mainly because of their probability interpretation. T h e integrals and their probability interpretation were extensively studied in [4], where many examples are given for their usage; the reader wanting m o r e background on t h e m is referred to [3] and [4]. T h e definitions (2.1) and (2.2) and their probability interpretation also hold for vector arguments, namely for a = (a 1, a 2. . . . , a b) if we replace each of the limits of integration in (2.1) and (2.2) by (a 1, a 2 . . . . . a b) respectively. Similarly, we can generalize r by replacing it with the vector r = ( r l , r z , . . . , r b) and using r i to denote (rl, r 2 , . . . , r i _ l , ri+ 1. . . . ,rb); the right-hand side of (2.1) a n d / o r (2.2) will then contain x r,-I and 1-Ibi=lF(ri); also, br, has to be replaced by E/b=lri . T h e explanatory material in [4] explains how to use the tables of (2.1) and (2.2) with scalar (i.e., equal) arguments to get values of the Dirichlet integral with vector (i.e., unequal) arguments. Since we are interested in (the first two) m o m e n t s of frequencies (at stopping time) for pairs of cells from the same or from different types, we can obtain a general result (of the type desired) with at most two quota-free cells. Let p denote the vector (p~, P2 p~) and let pi be the same vector with Pi omitted; let a i =Pi/Pi" The notation of the results below are written for the "soonest" problem; but, as a result of duality (see [4]), we can replace all D's by C's and automatically get answers for the corresponding "latest" problem; this will be illustrated below by some simple dice problems. We also indicate below how to replace the D's by CD's so that the answer applies to the J t h - o r d e r e d problem ( J = 1, 2 , . . . , a), i.e., when we wait for any J of the a frequency quotas to be satisfied. We n e e d to define W 1 and W2 for the soonest problem using fi and fi in place of r i and r i respectively; these W's are defined by ....
W1
,
= k fi D~a"-O(fi; fi + 1),
(2.3)
i=1 Pi
W2 = ~ fi(Lp2+ 1) D ( ~ - l ) ( f / ; i=1
fi + 2).
(2.4)
t
We are interested in cross m o m e n t s and correlations (i) between two quotafree (QF) cells, (ii) between two frequency quota (FQ) cells, and (iii) between one of each of these two types (QF vs. FQ). In accordance with these three goals, we have three subsections below.
M. Ebneshahrashoob,M. Sobel / DirichletAnalysis
296
Z1. Two quota-free cells; stopping rule based on frequency quotas Let Z 1 and 1 2 with cell probabilities ql and q2, respectively, denote the frequencies at stopping time in two of the 3' quota-free cells. Let N denote the total number of observations needed for any of our problems; for the "soonest" problem, we use the notation N = N (s) or N(a); for the "latest" problem, we write N = NtL) or Nt"); and, more generally, for the Jth-ordered problem, we write N = N (J) for any J ( J = 1, 2 , . . . , a). From equation (5.8) on p. 62 of [4], we already know that the ascending factorial moments of N (1) for the soonest problem are given by E { N {'~} = W l,
E { N O ) ( N O) + 1)} = W2,
(2.5)
and hence Var{N (1)} = W2 -- Wl(1 + Wl),
(2.6)
where W/ (i = 1, 2) is given by (2.3) and (2.4) respectively. For the "latest" problem, the same results hold with D replaced everywhere by C. Similarly, for the Jth-ordered problem, we replace D ("-a) by CD (J-a'"-~) with other arguments unchanged (e.g., in both (2.3) and (2.4)) and multiply the result by the combinatorial (J-l), ~ - 1 where a > J are both positive integers and both known. In any general sequential setting, if X is the frequency at stopping time in any cell (QF or FQ or otherwise) with cell probability p, then it follows from the well-known Wald's T h e o r e m that E(X) =pE(N),
(2.7)
where N is the (random) total number of observations needed to reach a stopping point. Applying this to Case 1, we can write for the k t h quota-free cell (k = 1, 2 , . . . , y )
E(lk) =qk L f~i~o(7-1)(fi; fi "~ 1 ) = q k W 1 .
(2.8)
i=1Pi"
Similarly, for any frequency-quota cell X~, we have the same relation (2.8) with qk replaced by Pr In particular, we note with interest that the ratio of E(X) to E(N) does not depend on the size of the frequency quota, but only on the probability of the cell involved. Using the same reference as above, (5.8) of [4], we can also show for the
second descending factorial m o m e n t that we have E [ Z k ( Z k - 1)1 = q kz L i=1
f i ( f i + 1) p2 D~7-')(f/; f / + 2 ) = qkW2 z .
(2.9)
t
From (2.8) and (2.9), we also obtain for the variance Var(Zk ) -_ qkW2 2 + qkW1 -- (q~W1) 2.
(2.10)
M. Eb~teshahrashoob,M. Sobel / DirichletAnalysis
297
For two distinct quota-free cells whose frequencies at stopping time are denoted by Z1, Z2, we have
E( Z,Z2) = q,qzW2,
(2.11)
and hence Cov(Z1, Z2)=qlqz(W2- WIz) =qaqz[Var(u~a)) + E(N~I))],
(2.12)
which is clearly positive except in a totally degenerate situation. This already shows a property unlike (and, in fact, quite different from) the usual multinomial; more of such comparisons will be discussed later. As an illustration, consider the Jth-ordered problem for tossing a single fair (6-sided) die, where the faces 1, 2, 3 and 4 each have common frequency quota f = 1 and the remaining two faces are quota-free. In fact, below we consider four problems, namely for J = 1, 2, 3, and 4. For the soonest (J = 1) problem, we obtain, using the same notation for W and Z with superscript 1 to denote soonest,
W1~l)=6(4)D~3)(1; 2 ) = 1.5; E(Z~ 1)) = E ( Z ~ 1)) =0.25, W2~1)= 36(8)D~3)(1; 3) = 4.5,
Cov(Z~ a), Z~21))= 0.0625,
Var(Z~ 1)) = Var(Z(2 ')) = 0.3125,
(2.13)
Corr(Z} '), Z(2')) = 0.2 > 0.
This completes our discussion of the soonest problem, i.e., for J = 1, for quota-free cells. For the Jth-ordered problem, to calculate CD (g-~''~-s) from the table, we use either one of the two following results: J-1
CD(J-I'a-J)= E ( - 1 ) i ( J - 1) D(a-J+i) i=0
i
= I2 ( - 1 ) j j=0
J c
(2.14)
with all the missing vector arguments remaining the same on both sides of (2.14). Below, we use (2.8) through (2.12) and (2.14) for each J(J = 2, 3, 4) with the notation Z (J) and W (g) for Z and W, respectively. For the second-ordered quota (J = 2) in the above die problem, we obtain m (2)=
72CD~1'2)(1; 2) = 3.5;
E(Z~ 2)) = E(Z(22)) = 0.583333,
w~Z)=s64CD~a'2)(1; 3 ) = 18.5, Cov(Z[ 2), Z2~2)) =0.173611, Var(Z~ 2)) = Var(Z2~2)) = 0.756944,
(2.15)
Corr(Z~ 2), Z~2)) = 0.229358 > 0.
For the third-ordered quota (J = 3), we obtain W1~3)= 72CD~2.1)(1; 2) = 6.5; W~3) = 864CD~2'a)(1; 3) = 57.5,
E(Z~ 3)) = E(Z~ 3)) = 1.083333, Cov(Z~ 3), Z~23))= 0.423611,
Var(Z~ 3)) = Var(Z~ 3)) = 1.506944,
Corr(Z~ 3), Z2~3)) = 0.281106 > 0.
(2.16)
298
M. Ebneshahrashoob, M. Sobel / DirichletAnalysis
For the latest problem (J = 4), we obtain W( 4)= 24C~3)(1; 2 ) = 12.5; W2~4)= 288C~3)(1; 3 ) = 207.5,
E(Z~ 4)) = E ( Z ~ 4)) = 2.083333, Cov(Z~ 4), Z~24)) = 1.423611,
Var(Z~ 4)) = Var(Z~24)) = 3.506944,
(2.17)
Corr(Z~ 4), Z2~4)) = 0.405941 > 0.
It is interesting to m a k e several remarks at this point about the correlation (between two quota-free cell frequencies) in our model, either for all f (the c o m m o n frequency quota) and all J, or in the limit as f ~ ~. (1) It follows from (2.12) that Corr ( Z 1, Z 2) > 0 for all f and all J values. This already indicates a f u n d a m e n t a l difference from the usual multinomial result where we have negative correlations; in fact, we cannot approach the multinomial case even by going to the limit in f. (2) Note in Table 1 that the correlations are increasing in f (the c o m m o n frequency quota) for each soonest problem, i.e., for each row with J = 1; they are decreasing in f for each latest problem, i.e., for each row with J -- i. It can be shown that for all values of f and J the value of p will lie between the lower 1 1 b o u n d ~qaq2/(1 + ql)(1 + q2) and the u p p e r b o u n d ~ inclusive; the value ~ is attained w h e n we have only one quota cell, so that the soonest and latest problems then coincide. (3) For the case a = 1, the correlation p = p(Z 1, Z 2) does not d e p e n d on f and in fact (since J = 1 also) we have p ( Z 1, Z 2) = ( q l q 2 / ( P + q l ) ( P + q 2 ) , where p is the cell probability of the single quota cell. For p = q~ = q2, this last result gives p = ½, as in the top line of Table 1. (4) For the soonest problem with a > 1, we claim that as f ~ ~ the correlation p(Z 1, Z z) approaches the u p p e r b o u n d ~. For the latest problem with a > 1, we claim that as f ~ oo the correlation p(Z1, Z 2) approaches the lower b o u n d given above in (2). Many of these properties m e n t i o n e d above are illustrated numerically and summarized by the results in Table 1 below which deals with the problem of tossing a fair (6-sided) die with four sides having c o m m o n quota f and two quota-free sides. All statements above are consistent with the results in this table.
2.2. Two frequency-quota cells; stopping rule based on frequency quotas Let X 1 and X 2 denote frequencies at stopping time for two distinct frequency-quota cells with cell probabilities Pl and P2 and with frequency quotas fl and f2, respectively. In this section, we derive an expression of E{XaX 2} for the stopping rule based on stopping as soon as any J of the a quotas are satisfied, i.e., based on the Jth-ordered case. T h e first-order m o m e n t s are given above in (2.3), (2.5), and (2.7), and we now concentrate on second-order moments, including the variance. After deriving a general result that holds for any J(1 ~
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
299
Table 1 Numerical monotonicities of PQF,OF = the correlation between two quota-free cells (or faces) and E {#tosses} of a fair die needed to satisfy any J quotas when i of the 6 faces have a common quota
f ( l <~J <~i <~6). i
J
1
1
2
f=l
f=5
f=10
30.00000
60.00000
Exp = E{N}
6.00000
PQF,QF
0.50000
0.50000
0.50000
Exp = E{N}
3.00000
22.61719
49.42818
PQF,QF Exp = E{N}
0.33333 9.00000
0.37435 37.38281
0.38391 70.57182
PQF,QF Exp = E{N}
0.45455 2.00000
0.43085 19.46841
0.42398 44.68938
PQF,QF Exp = E{N}
0.25000 5.00000
0.31357 28.91475
0.32784 58.90578
PQF,QF Exp = E{N}
0.30233 11.00000
0.30845 41.61685
0.30910 76.40484
PQF,QF Exp = E{N}
0.42609 1.50000
0.39381 17.60514
0.38449 41.79882
PQF, QF
0.20000
Exp = E{N}
3.50000
0.27648 25.05820
0.29346 53.36103
PQF,QF Exp = E{N}
0.22936 6.50000
0.25214 32.77129
0.25633 64.45054
PQF,QF Exp = E{N}
0.28111 12.50000
0.27430 44.56536
0.27188 80.38962
0.40594
0.36955
0.35903
1
2
2
3
1
3
2
3
3
4
1
4
2
4
3
4
4
PQF,QF
Note that in the range of the table, the correlation increases with f for 1 ~ J ~<~1, and decreases with f for l i < J~< i. Note also that the correlation PQF,QF is always positive and indeed 1 p / ( l + p ) < ~ PQF,QF "~ .<1~ for all values of f, i and J. V =
F o r t h e t w o c e l l s c o r r e s p o n d i n g t o X 1 a n d X 2, w e c o n s i d e r t h e f o l l o w i n g e i g h t c a s e s a c c o r d i n g t o w h e t h e r o n e is t h e " t e r m i n a l " c e l l o r w h e t h e r o n e o r both or neither have already reached their quotas at the time of stopping; we w r i t e t h e f a c t t h a t X i h a s r e a c h e d its q u o t a a s X i >~fi.
Case 1: S t o p w i t h c e l l # 1 ; X 2 >~f2; J-2 o t h e r s a l s o r e a c h e d t h e i r q u o t a s b u t a-J did not,
Case 2: S t o p w i t h c e l l # 1 ; X 2 < f 2 ; a-J-1 d i d n o t .
J-1 others
also reached
their quotas but
300
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
Cases 3 and 4: Interchange the indices 1 and 2 in the above two cases. Case 5: X 1 > f l , 222 > f 2 ; some other one is terminal; J-3 others also reached their quotas but a-J did not. Case 6: X 1 >/fl, X2 < f 2 ; some other one is terminal; J-2 others also reached their quotas but a-J-1 did not. Case 7: Interchange the indices 1 and 2 in Case 6. Case 8: S l < f l , )(2 < f 2 ; some other one is terminal; J-1 others also reached their quotas but a-J-2 did not. We present the contributions to E { X 1 X 2} separately for each of these eight cases; the final result for E { X 1 X 2} is the sum of these eight contributions. We use the notation Tn, T/2, etc., and T~ = Tia + T~2 + . " for the contribution to E { X I X z} from Case i(i = 1, 2 , . . . , 8). Thus for Case 1, we obtain T 1 = Tll - T12 ,
(2.18)
where
a-2 (J-2) (J-2,a-J) T u = f , ~2P2 E CD (P' P['2] (fl,2, fl,2, fl + 1), 1,2
'
t'
"
7'77,
T,2=
f2--1 Efl
x=0
[ fl +X -- 1 X
fl - 1
(2.19)
)( )ii 2)x pi+p2 (p, +p---7
a-2 (J-2) ( J - 2 , a - J) × E CD( _,ptl2 P" )(L,2, \ Pl +P2 ' Pl +P2 !
fl,2, "" f l + X ) ,
(2.20)
where × is used here and below as a product symbol. Summations in front of a CD (here and below) have a combinatorial as upper limit. For example, with upper limit (J-2), ~-2 the summation represents a sum over all possible subsets of size J - 2 from the a - 2 quota cells; the n u m b e r of terms is therefore (~-2) and in the simplest case with c o m m o n p and c o m m o n f , we replace the summation by the combinatorial above it (as a multiplier). The notation (v', v") here and below for the f - a r g u m e n t and also for the subscript p of CD represents a partition of the corresponding vector v into two parts corresponding to the two superscripts on the same CD. For Case 2, we obtain the single summand
T2 -
E
x=l
fl x
(ii+x 1)( )Il( 2/x f,-
1
P(+P2
P, + P 2
tr-2 (J-l) (J- l,ot-J-1) X ~_, CD[ _P,,2 Y',,2 )(fl,2, ' fl,2, " • f, +X) . ' \ Pl ~'-P2' Pl +P2
(2.21)
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
301
For Cases 3 and 4, the results are obtained by a simple interchange and we omit them.For Case 5, we have the four summands T5 = Ts~ - T52 - T53 + 7"54, where
(2.22)
P2
~-'~fi(fi+l)(~i)(-~i ) i=3
T51
o~-3
×
(~) ~(,
. ~o~ ~ ) , (fl,2,i' 1,2,i
" " fi -b 2), fl,2,i'
1,2,/
(2.23)
191 ' P l
[ fi q- X
f2--1
r,~=i=,x=,g
fix t fi
Pl
P2
p;+pi
p:+pi
i
p2+pi
(~--~) ,,_ ~,o_,, × E c o t ; ,.2,i /¢;,2,i ](f1,2,i, f ,,:,i, " " 1 + x + fi ). P2 +Pi 'P2 q-Pi ]
(2.24)
For T53, we interchange the indices 1 and 2 in (2.24). For 3"54, we obtain Zs4=
E i=3
x=l
E xy y=l
x'Y'fi--1 ]lPl+P2+Pi (~-3) E CD
Pl -bP2 -bPi
Vl,2,,
Pl+Pz+Pi
p"1,2,i
(fl,2,i, " " x -by -bfi). ' fl,2,i,
Pl + P 2 + P i ' P l +P2+Pi
(2.25) For Case 6, we obtain only two summands T6 = T 6 1 - T62,
(2.26)
where
T6,=~ E
x
Pl
P___L_2
i=3 x = l
fi
P2TPi
i
P2-bPi
P2~Pi
a--3
× ~ ~(~, 1,z,i ,,,) 1,2,i
. . . .fl,2,i,
1 +x +fi),
(fl,z,i,
(2.27)
P2 +Pi 'P2 +Pi
T62 =
E Exy i=3 x = l y = l
X, y, fi -- 1
PI +Pc -bPi
Pl + P 2 -bPi
c~-3
×
101 -b P 2 -b P i
cD(
~r
1.2..
,,t
.1.2./
!
~ (f..2./. fl,2,i, " " X + y + L)"
- - +Pi ' Pl +P2 - - q-lPi ] ~Pl +P2
(2.28)
M. Ebneshahrashoob,M. Sobel / DirichletAnalysis
302
For Case 7, we just interchange the indices 1 and 2 in Case 6. Finally, for Case 8, we have only a single summand
Ts =
E i=3 x=l
×(
pl
Y'. xy y=l
X, y ,
fi-- 1
) EcD ( - +Pi
+Pi
Pl +P2
+Pi
(J - 1,or - J - 2)
p'
Pl +P2
Pl +P2
)x(
1,2,i
Pl,P2,Pi
,
p" 1,2,i Pl +P2+Pi
~
1
( f l ,t 2 , i ' f l ,tt2 , i ', X + y + jri).
(2.29) It is easy to see from the analysis of the eight cases that we can omit some of these cases for special problems. Thus for the soonest ( J = 1) problem, we need T 2 + T4 + T 8 and for the latest ( J = a ) problem, we need T 1 + T 3 + Ts. This apparent proliferation of terms (as in the case of Ts) was due to the fact that we wanted to avoid any infinite series that we could not sum exactly, i.e., we were able in this way to get finite exact answers for all the moments of interest. Since we now have E { X 1 X 2} for two frequency-quota cells, we can easily obtain the covariance b e t w e e n X 1 and X 2. Above, we completed the cross moments E { X ~ X 2} for two frequency-quota cells from which we can get a covariance, but we have not yet calculated the variance for one such cell, say X 1. For this purpose, we consider now the descending factorial m o m e n t E{X~(Xa - 1)}. In analogy with the above eight cases, we now have three cases as follows. Recall that a is the number of quota cells.
Case 1: Stop with cell #1. Case 2: X 1 >~fl; s o m e other cell is terminal; J - 2 quotas but a - J did not. Case 3: XI < f l ; some other cell is terminal; J quotas but o~ - J 1 did not.
others also reached their 1 others also reached their
W e use the same T-notation as before, but it should be noted that the T-values here are not related to the T-values above or the ones used below in later derivations. For Case 1, we obtain (without any summations) the single term ( J - 1,a - J )
T, = f a ( f , -
1)CD
p'l p'~
(f~, f ; ; f l ) .
(2.30)
Pl ' Pl
For Case 2, we use the difference of two summands
T 2 = T21 - T22,
(2.31)
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
303
where a-2 (J-2)
T21= ~ - ~ f i ( f i + l ) i=2
i11
T22 = E E f / ( L i=2 x=2
-~i)
+1
a-2 (J-2) ×
2
×
,(
E
f/+x-1 X- 2
(J-2,a-J) - ~t vp V VV . CD (,.1,, Pl,il(fl,i , fl,i, fi q- 2),
)(
(2.32)
\ Pi ' Pi ]
pl P l +Pi
)x(
Pi Pl +Pi
( J - 2 , a - J)
c
,l
. ' i.i
~+,,'pl
pl., ~/ (, -f1 ,,, , fl,i, " • fi + x) " ---
(2.33)
+p,,
For Case 3, we again have only one T-value, namely
i11
T3= E E L ( f i i=2 x=2
(7-X
E
CO
+1
((
+x-1 X -- 2
1,i
"
1)
-
)(
__P~ Pl q-Pi
)x(
p, Pl -Jr-Pi (2.34)
,
,,.
pI,i / ( f l , i , fl,i, fi -q-X)"
Pl~-Pi 'Pl +Pi ]
For the general Jth-ordered problem, we use all three terms, i.e., Ta + T2 + T3, but for the special case of the soonest ( J = 1) problem, we only need T 1 + T3. Similarly, for the latest ( J = a) problem, we only need T 1 + T2. This completes the calculation of E { X a ( X I - 1)} for any frequency-quota cell with a stopping rule based on frequency quota; and from this we can easily obtain the variance of X r Note that X 1 is X~ J) for the Jth-ordered problem and we have dropped the superscript above.
2.3. One frequency-quota and one quota-free cell; stopping rule based on frequency quotas Since we already have E ( X 1) and o-2(X1) for the frequency quota cell and also E ( Z 1) and o-2(Z1) for the quota-free cell in the general Jth-ordered problem, we need only look at the problem of calculating the cross moment E{XaZa} in this section. We again require three cases for the particular pair of cells whose frequencies at termination are X 1 and Za.
Case 1: Stop with cell #1, so that X a =fa. Thus, J - 1 quota cells reached their quotas and a - J did not. Case 2: X 1 ~>fl; J - 1 others reached their quotas, one of which is terminal but a - J did not. Case 3: X 1 < f l ; J others reached their quotas, one of which is terminal but a - J - 1 did not.
M. Ebneshahrashoob,M. Sobel / DirichletAnalysis
304
For Case 1, we obtain the single summand
Tl=f12~
E
CD (pl p~] (f~, fT; fl + 1).
(2.35)
\Pl'Pl/
For Case 2, we obtain the difference of two summands, combined to form namely
T2-'~ i=2~fi(fiq-1) (~/)(~/O1 Pl) _~
y]xfilfi+x i=2 x=l ~X × E
X E
(J- 2,a -J) CO (P'l,i P;,i) (f;,i, f~,i; fi
)( )( )x( ) ql Pl pl-}-lD-----~i~
T 2,
+2)
Pi fi plq-p----~i
CD ~'~,, eli (f~,i, f~',g; 1 +x +fg).
(2.36)
Pl q-Pi Pl +Pi
For Case 3, we obtain the single summand
i=2 x= 1 c~-2 (J-l) x
E
X ] ~P;-+Pi (J-l,a-J-1) c z ) ( ~'~i " pl,,
Pl +Pi
' " I(fl,i,fl,i,
Pl -t-Pi
1 +x +fi).
(2.37)
In general for the Jth-ordered problem, we need T 1 + T 2 + T3, however, for the soonest ( J = 1) problem, we need only u s e T 1 + T 3. Similarly, for the latest (J = a) problem, we need only use T 1 + T2.
3. Derivation of the gpgf for the General Case (Case 3) As in Section 2 of [2], a complete derivation of the gpgf for the soonest problem requires an initial writing of [E~=lrj- (fl - l)]l-Iia=xfi equations and a gradual reduction of the number of equations down to one. The resulting equation is identical in form to equation (2.1) of [2], namely
o
¢~(yX)(t)= E XhFfhh H h=l
rn~-=hl
H F~* fh-- 1 + Em,him k~--hl ] [fh - 1, il,...,ih_l, ih+l,...,ia
q-( ~ y j R j ) f i i~__i(kO1f~k)[.~am=li.m], j=l m=l ll,...,l a J
]
(3.1)
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
305
where FITn=~E indicates a multiple s u m m a t i o n (of a sums) and the bracket quantities are the usual multinomial coefficients. Here, x = ( x 1, x 2 . . . . ,x,~) (resp., y = (Yl, Y 2 , . . . , Y/3)) are markers (i.e., one will be unity and all the rest zero) for stopping in the soonest problem with any one of the a (resp.,/3) cells with frequency (resp., run) quotas. In order to define the F and R quantities (which are not the same as in (2.1) of [2]), we first define for j = 1, 2 . . . . ,/3 1 - (p~) ~ Q~
=
0"=
1 -p~
/3 ,
O=I-IQ~
j=l
(3.2)
/3 /3 Y'~ F l a h ( - l f o r / 3 = l ) . j = l h=.l
In terms of these and the gpgf p a r a m e t e r t, we have for i = 1, 2 , . . . , a Pi 0
F, =
0"--0(/3--1+
(3.3) EYk=lqktk) '
and for j = 1, 2 , . . . , / 3 /3
(pj )r' I-[ Q h h=l
R~=
~j
~
(3.4)
).
0"-0(/3-1+ Y'~q~t~ k=l
Note that 0 and 0* do not d e p e n d on the p a r a m e t e r t. T h e same result as in (3.1) can be written with Dirichlet integrals in the form dp(yx)(t) = E x i ( A ~ i ) ) f ' D ( A T - 1 ) ( f i ; f i ) +
D(,~) ~ r . 1), o,/UoW,
i=1
where d 0, d00, and
j=l d i
(3.5)
do
for i = 1, 2 . . . . , a are defined by
doo = O* - 0 /3 - 1 + ~
qktk
,
k=l d o = O* - 0 /3 - 1 + Y'~pi +
qktk
(3.6)
,
i=1
di = O* - 0 fl - 1 +
Ph +
=
qktk
k=l
•
M. Ebneshahrashoob,M. Sobel / DirichletAnalysis
306
For (3.5) we also n e e d for i = 1, 2 , . . . , a
A~h'
Ph 0 di
,
Mi
( A ~ 1), A~ 2,,
"'',
and h = 1, 2 , . . . , a
A~i-l),
A( i+1,
--i
, ' ' ' ,
Z~.a)).
(3.7)
All these Dirichlet integrals in (3.5) are obtainable from [4].
3.1. Moment results for Case 2: Only cells of types 2 and 3 present For Case 2, where only run quota and quota-free cells are present, we give, for the soonest problem, m o m e n t results for the total frequencies at stopping time in the quota-free cells. F r o m the general result (3.1), the gpgf for this special case is
q~(y')(t) = E yjRi"
(3.8)
j=l
F r o m (1.1), we have ~ j ~ l p'j -4- ~k=lqk = 1. By taking derivatives with respect to t in (3.8), we obtain
qkO =
;
k=l,2,...,y,
(3.9)
where N (1) is the total n u m b e r of observations n e e d e d for stopping in the soonest problem. Also, using only first m o m e n t s of Z k, V a r ( Z k) = E 2(Zk) + E(Z k), C o v ( Z h, Z k ) = E ( Z h ) E ( Z k ) > 0 ;
(3.10) Vh # k ,
E(Zh)E(Zk) C°rr(Zh' Zk)=
(3.11) 1/2
[1 + E(Zh)][1 + E(Zk) ]
(3.12)
Example. Consider tossing a single fair die until we observe a run of length r from any one (soonest problem) of four specified sides. H e r e we have two quota-free cells and /3 = 4, 3' 2, ri = r (i = 1, 2, 3, 4), ql = q2 Pl -- 2 --P3 1 P4t = Z" T h e results are given as Table 2 below. These numerical values are consistent with the results obtained above; in particular, the covariance and correlation are positive as they must always be by (3.11) and (3.12).
Table 2 Run problem. Moments of total frequencies for the quota-free cells
E ( N (1)) E(Z k) (k = 1, 2) V a r ( Z k) (k = 1, 2) Cov(Z 1, Z 2) Corr(Zl, Z 2)
r=l
r=2
r=3
1.5 0.25 0.3125 0.0625 0.2
10.5 1.75 4.8125 3.0625 0.6363
64.5 10.75 126.3125 115.5625 0.9149
M. Ebneshahrashoob, M. Sobel / Dirichlet Analysis
307
4. Some applications of previous results
A simple example is devised showing how to use the previous results in a two-stage setting to make the total frequencies (for two different cells) uncorrelated at stopping time. Consider a multinomial setting with six cells and let n denote a fixed sample size for the first stage. Suppose the cells are numbered 1 to 6 and that cells #1 and #2 have frequency quotas fl and f2, respectively, and that the remaining four cells are quota-free. In accord with our notation above, we use Pl and P2 for the cell probabilities of the first two cells and ql, q2, q3, and q4 for the cell probabilities of the last four cells. Thus, in the second stage, we wait until either cell #1 has frequency fl or cell #2 has frequency f2, whichever comes sooner. Our interest is the covariance, Cov (1,3), for the total frequencies of cells 1 and 3 over both stages (i.e., at stopping time). From previous results in Section 2.3, this is given by 2q Cov(1,3) = - p l q a ( n + W12) + f~q~ P 2 / P l r~ eJ 2' f '1 + 1) Pl P l q l ~ , ~ 2 + 1)D~l/p2(fl - 1; f2 + 2). + __p_~z12t] ~1)
(4.1)
This formula is valid only for fl > 1; but, if D is replaced by C, then it also holds for fl = 1 as in our first example below. We now apply this result to the 1 case of a fair 6-sided die, i.e., with all six probabilities equal to g. Example fl = f2 = 1 (later problem with two frequency quotas). Replacing the D's by C's in (4.1) and also in (2.3), we obtain
Cov(1,3) = - 1 I n -
+ (12C~1)(1; 2)) 2] + C~1)(1; 2) + 2
- n + 18 36
(4.2)
Hence we need n = 18 observations in the first (multinomial) stage to make the total frequencies for cells 1 and 3 uncorrelated at stopping time.
References [1] Ebneshahrashoob, M. and M. Sobel, Sooner and later waiting time problems for Bernoulli trials: Frequency and run quotas, Statist. Probab. Lett. 9 (1990) 5-11. [2] Sobel, M. and M. Ebneshahrashoob, Quota sampling for multinomial via Dirichlet, J. Statist. Plann. Inference 33 (1992) 157-164. [3] Sobel, M., V.R.R. Uppuluri and K. Frankowski, Dirichlet distribution-type 1, in: Selected Tables in Mathematical Statistics, Vol. 4 (IMS and AMS, Providence, RI, 1977). [4] Sobel, M., V.R.R. Uppuluri and K. Frankowski, Dirichlet integrals of type 2 and their applications, in: Selected Tables in Mathematical Statistics, Vol. 9 (IMS and AMS, Providence, RI, 1985).