On k-match problems

On k-match problems

Journal of Statistical Planning and Inference 109 (2003) 67 – 79 www.elsevier.com/locate/jspi On k-match problems Katuomi Hiranoa;∗ , Sigeo Akib a ...

129KB Sizes 1 Downloads 40 Views

Journal of Statistical Planning and Inference 109 (2003) 67 – 79

www.elsevier.com/locate/jspi

On k-match problems Katuomi Hiranoa;∗ , Sigeo Akib a The

Institute of Statistical Mathematics, 4-6-7 Minami-Azabu Minato-ku, Tokyo 106-8569, Japan of Informatics and Mathematical Science, Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama-cho, Toyonaka 560-8531, Japan

b Department

Abstract Several waiting time random variables for a duplication within a memory window of size k in a sequence of {1; 2; : : : ; m}-valued random variables are investigated. The exact distributions of the waiting time random variables are derived by the method of conditional probability generating functions. In particular, the exact distribution of the waiting time for the 2rst k-match is obtained when the underlying sequence is generated by higher order Markov dependent trials. Examples for numerical calculations are also given. c 2002 Elsevier Science B.V. All rights reserved.  MSC: 62E15; 60E99 Keywords: Waiting time; Duplication; Discrete distribution; Markov chain; Probability generating function; Urn model

1. Introduction Suppose that an urn contains m balls B1 ; B2 ; : : : ; Bm and that the ball Bi bears the number i, i = 1; 2; : : : ; m. Suppose that these balls are sampled one at a time with replacement, thus a sequence X1 ; X2 ; : : : of i.i.d. {1; 2; : : : ; m}-valued random variables is generated. Arnord (1972) derived the distribution of the waiting time until the 2rst k-match (i.e., until a ball is drawn to duplicate one of the k preceding balls drawn). To be speci2c, the exact distribution of T ≡ inf {i ¿ 2; Xi = X‘ for some ‘ = max{i − k; 1}; : : : ; i − 1}  This research was partially supported by the ISM Cooperative Research Program (98-ISM·CRP-A8) of the Institute of Statistical Mathematics. ∗ Corresponding author. E-mail addresses: [email protected] (K. Hirano), [email protected] (S. Aki).

c 2002 Elsevier Science B.V. All rights reserved. 0378-3758/02/$ - see front matter  PII: S 0 3 7 8 - 3 7 5 8 ( 0 2 ) 0 0 2 9 9 - 9

68

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

was obtained. Some generalizations of the k-match problem and Poisson approximations of the corresponding distributions have been treated (e.g. see Herzog et al., 1998; Burghardt et al., 1994). Arnord (1972) provided the following example of the problem: assume m servers and customer arrivals at one-minute intervals and assume that service takes k minutes and that a customer chooses a server at random; and if the server is occupied, the customer is lost; then the 2rst k-match means the 2rst loss of a customer. In the k-match problem, if we set k =∞, the problem reduces to the birthday problem (m = 365). If k = ∞ and the probability that each server is chosen is not identical, then it becomes the surname problem (see Mase, 1992). When k = 1, the k-match means a run of length 2 and the problem becomes a special case of the problem proposed by Aki (1992). Considering the example of customer loss, the following new problems must be interesting: what is the exact distribution of the waiting time for the 2rst k-match if k is 2nite and the probability that each server is chosen is not identical? Moreover, can we obtain the exact distribution of the waiting time for the 2rst k-match if the sequence X1 ; X2 ; : : : has Markov dependence? More speci2cally, what is the exact distribution of the 2rst k-match if each customer can see which server is chosen by the customer just before him and he never chooses it? This problem will be treated in Section 3. In Section 2, we treat the new waiting time problems such as the second k-match and the 2rst occurrence of two matches within a memory window of size k besides the 2rst k-match problem. In Section 3, non i.i.d. cases are treated. The main tool is to solve a system of equations of conditional probability generating functions (p.g.f.). In the study of discrete distribution theory, analysing the functional relations of the conditional p.g.f.s has been a powerful tool (see, e.g. Rao et al., 1980; Ebneshahrashoob and Sobel, 1990; Hirano and Aki, 1999).

2. Derivation of exact distributions 2.1. First k-match We begin with deriving the exact distribution of the waiting time until the 2rst k-match by the method of conditional p.g.f.’s. Let (t) be the p.g.f. of the distribution of the waiting time until the 2rst k-match. When k ¿ m, the following system of equations holds.

(t) = t 1 (t);

1 (t) =

m−1 1 t+ t 2 (t); m m

2 (t) =

2 m−2 t+ t 3 (t); m m

.. .

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

m−1 (t) =

69

m−1 1 t + t m (t); m m

m (t) = t;

(1)

where i (t) is the p.g.f. of the conditional distribution of the waiting time from the ith point of time given that X1 ; X2 ; : : : ; Xi have been observed already and the 2rst k-match has never been observed until Xi . By solving this, we have

(t) =

[m]‘−1 1 2 m−1 2 3 m! t + t + ··· + (‘ − 1)t ‘ + · · · + m+1 · mt m+1 ; ‘ m m m m m

where [a]‘ = a(a − 1) : : : (a − ‘ + 1). When k ¿ m, Arnord (1972) obtained P(T ¿ ‘) =

[m]‘ : m‘

Then, we have P(T = ‘) = P(T ¿ ‘ − 1) − P(T ¿ ‘) =

[m]‘−1 (‘ − 1): m‘

Thus, our result given by the method of conditional p.g.f.’s agrees with that of Arnord (1972). Next, we study the case of k ¡ m. De2ne i (t) (i = 1; 2; : : : ; k − 1) as in the previous case. Suppose that we have observed more than or equal to k X ’s and that k observations just before are diLerent from each other. Given the condition let k (t) be the p.g.f. of the conditional distribution of the waiting time for the 2rst k-match from this point of time. Then, we have the following system of equations of the conditional p.g.f.’s.

(t) = t 1 (t);

1 (t) =

1 m−1 t+ t 2 (t); m m

2 (t) =

2 m−2 t+ t 3 (t); m m

.. .

k (t) =

k m−k t+ t k (t): m m

(2)

The diLerence between (2) and (1) is only the last equations. From the last equation of (2), we obtain

k (t) =

(k=m)t ; 1 − ((m − k)=m)t

70

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

which is a p.g.f. of the geometric distribution. Then the solution of (2) can be written as 1 m−1 2 3 m−1 m−2 3 4

(t) = t 2 + t + t m m m m m m +··· + +

m − (k − 2) k − 1 k m−1 m−2 ··· t m m m m

1 m−1 m−2 m − (k − 1) k k+1 ··· t : m m m 1 − ((m − k)=m)t m

In the case, Arnord (1972) derived ‘−(k+1)  [m]k+1 k P(T ¿ ‘) = k+1 1 − m m

(3)

(‘ ¿ k + 1):

If we write p = 1=m in (3), we obtain

(t) = pt 2 + (1 − p)2pt 3 + (1 − p)(1 − 2p)3pt 4 + · · · + (1 − p)(1 − 2p) · · · (1 − (k − 2)p)(k − 1)pt k +

∞ 

(1 − p)(1 − 2p) · · · (1 − (k − 1)p)kp(1 − kp)j t k+j+1 :

j=0

Hence, for ‘(¿ k + 1) we have ∞  P(T ¿ ‘) = (1 − p)(1 − 2p) · · · (1 − (k − 1)p)kp(1 − kp)j j=‘−k

= (1 − p)(1 − 2p) · · · (1 − (k − 1)p)kp · [m]k+1 = k+1 m



k 1− m

(1 − kp)‘−k kp

‘−k−1 :

Therefore, our result agrees with that of Arnord (1972). 2.2. Second k-match In this subsection, we study the distribution of the waiting time for the second k-match. In the example of the customer loss, the second k-match means the second customer loss. Clearly, the distribution is diLerent from the convolution of the distribution of the waiting time for the 2rst k-match. Let (t) be the p.g.f. of the distribution of the waiting time for the second k-match. For i = 1; 2; : : : ; k − 1 let i (t) be the p.g.f. of the conditional distribution of the waiting time for the second k-match from the ith point of time given that X1 ; X2 ; : : : ; Xi have been observed already and the 2rst k-match has never been observed until Xi . Suppose that we have observed more than or equal to k X ’s and that we have never observed the 2rst k-match until the point of time. Given the condition let k (t) be the p.g.f. of the conditional distribution of the waiting

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

71

time for the second k-match from this point of time. For i = 1; 2; : : : ; k − 1, let i(h; ‘) (t) be the p.g.f. of the conditional distribution of the waiting time for the second k-match given that we have observed until Xi and that the 2rst k-match has already occurred between the hth and ‘th observations going back to the past from the ith point of time (i.e., the 2rst k-match has occurred between Xi−‘+1 and Xi−h+1 ). Suppose that we have observed more than or equal to k X ’s and that we have already observed the 2rst k-match between the hth and ‘th observations going back to the past from the point of time. Given the condition let k(h; ‘) (t) be the p.g.f. of the conditional distribution of the waiting time for the second k-match from the point of time. Suppose that we have observed more than or equal to k X ’s and that we have already observed the 2rst k-match and that k observations just before are diLerent from each other. Given the condition let k (t) be the p.g.f. of the conditional distribution of the waiting time for the second k-match from the point of time. By considering the condition of one-step ahead from every condition, we obtain the next proposition: Proposition 2.1. Under the above conditions, the following result holds in three separate cases. Case 1: k = m. The following relations hold for the p.g.f.’s:

(t) = t 1 (t); i+1

i (t) =

1  t m j=2

m

1  t m

m (t) =

(1; j) i+1 (t)

+

(1; j) (t) m

+

j=2

m (t)

m−i t i+1 (t) m 1 t m

for i = 1; 2; : : : ; m − 1;

m (t);

= t;

(1; i) m (t)

=

1 m−1 t+ t m m

(h; ‘) (t) i

=

i−1 m−i+1 t+ t m m

(2; i+1) (t) m

for i = 2; 3; : : : ; m − 1;

(h+1; ‘+1) (t) i+1

for 1 6 h ¡ ‘ 6 i; i = 2; 3; : : : ; m − 1; (h; m) (t) m

=

m−1 1 t+ t m m

m (t)

for 1 6 h ¡ m:

Case 2: k ¿ m + 1. The following relations hold for the p.g.f.’s:

(t) = t 1 (t); i+1

i (t) =

1  t m j=2

(1; j) i+1 (t)

+

m−i t i+1 (t) m

for i = 1; 2; : : : ; m − 1;

(4)

72

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

Table 1 Mean of waiting time (k; m)

First k-match

Second k-match

(2; 5) (3; 5) (4; 5) (3; 4) (3; 6)

4 18=5 = 3:6 88=25 = 3:52 13=4 = 3:25 71=18 = 3:94

34=5 = 6:8 2149=375 = 5:73 33651=6250 = 5:38 487=96 = 5:07 6211=972 = 6:39

m+1

m (t) =

1  t m j=2

(h; ‘) (t) = i (h; ‘) m+1 (t)

(1; j) m+1 (t);

i−1 m−i+1 t+ t m m

=t

(h+1; ‘+1) (t) i+1

for 1 6 h ¡ ‘ 6 i; i = 2; 3; : : : ; m;

for 1 6 h ¡ ‘ 6 m + 1:

(5)

Case 3: k ¡ m. The following relations hold for the p.g.f.’s

(t) = t 1 (t); i+1

i (t) =

1  t m j=2

k (t) =

k (t)

=

1 t m

(1; j) i+1 (t)

+

m−i t i+1 (t) m

k

k (t) +

1  t m

k m−k t+ t m m

j=2

(1; j) (t) k

+

for i = 1; 2; : : : ; k − 1;

m−k t k (t); m

k (t);

(h; ‘) (t) i

=

i−1 m−i+1 t+ t m m

(h; ‘) (t) k

=

k −1 m−k +1 t+ t m m

(h+1; ‘+1) (t) i+1 (h+1; ‘+1) (t) k

for 1 6 h ¡ ‘ 6 i ¡ k; for 1 6 h ¡ ‘ ¡ k;

m−k +1 k −1 t+ t k (t) for 1 6 h ¡ k: (6) m m From the above equations, we can derive the p.g.f.’s of the waiting time random variables for speci2c values of k and m by using computer algebra systems. Further, by diLerentiating the p.g.f.’s we easily obtain the values of mean and variance of the distributions. We give in Tables 1 and 2 the lists of the values of mean and variance of the waiting time for speci2c values of k and m. (h; k) (t) k

=

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

73

Table 2 Variance of waiting time (k; m)

First k-match

Second k-match

(2; 5) (3; 5) (4; 5) (3; 4) (3; 6)

4 128=75 = 1:71 781=625 = 1:25 49=48 = 1:02 845=324 = 2:08

194=25 = 7:76 132808=46875 = 2:83 59660199=39062500 = 1:53 13999=9216 = 1:52 4340195=944784 = 4:59

2.3. Two matches within a memory window of size k In this subsection, we study the distribution of the waiting time for the 2rst occurrence of two matches within a memory window of size k. Let  be the waiting time. In the example of the customer loss,  means the waiting time for the 2rst loss of two customers within k minutes. Let (t) be the p.g.f. of . For i = 1; 2; : : : ; k − 1 let i (t) be the p.g.f. of the conditional distribution of  from the ith point of time given that X1 ; X2 ; : : : ; Xi have been observed already and the 2rst k-match has never been observed until Xi . Suppose that we have observed more than or equal to k X ’s and that k observations just before are diLerent from each other. Given the condition let k (t) be the p.g.f. of the conditional distribution of  from this point of time. For i = 1; 2; : : : ; k − 1, let i(h; ‘) (t) be the p.g.f. of the conditional distribution of  from the ith point of time given that we have observed until Xi and that the 2rst k-match has already occurred between the hth and ‘th observations going back to the past from the ith point of time (i.e., the 2rst k-match has occurred between Xi−‘+1 and Xi−h+1 ). Suppose that we have observed more than or equal to k X ’s and that we have observed a k-match between the hth and ‘th observations going back to the past from the point of time for 1 6 h ¡ ‘ 6 k. Given the condition let k(h; ‘) (t) be the p.g.f. of the conditional distribution of  from the point of time. By considering the condition of one-step ahead from every condition, we obtain the next proposition: Proposition 2.2. Under the above conditions, the following result holds in two separate cases. Case 1: k ¿ m + 1. The following relations hold for the p.g.f.’s:

(t) = t 1 (t); i+1

i (t) =

1  t m j=2

m+1

m (t) =

1  t m j=2

(1; j) i+1 (t)

+

(1; j) m+1 (t);

m−i t i+1 (t) m

for i = 1; 2; : : : ; m − 1;

74

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

Table 3 Mean and variance of  (k; m)

Mean

Variance

(k; m)

Mean

Variance

(3; 3) (3; 4) (3; 5) (3; 6) (4; 4) (4; 5) (4; 6) (5; 5) (5; 6) (5; 7)

5.095 6.825 8.908 11.333 5.078 5.964 6.979 3.355 5.936 6.574

4.045 14.244 34.316 68.472 1.973 5.283 10.978 1.560 3.134 5.647

(6; 6) (6; 7) (6; 8) (k; 3) (k; 4) (k; 5) (k; 6)

5.695 6.139 6.606 4.333 4.828 5.266 5.662

1.599 2.517 3.896 0:444 0:736 1:049 1:378

(h; ‘) (t) = i (h; ‘) m+1 (t)

i−1 m−i+1 t t+ m m

= t;

(h+1; ‘+1) (t); i+1

(k ¿ 3) (k ¿ 4) (k ¿ 5) (k ¿ 6)

for 1 6 h ¡ ‘ 6 i; i = 2; 3; : : : ; m;

for 1 6 h ¡ ‘ 6 m + 1:

(7)

Case 2: k 6 m. The following relations hold for the p.g.f.’s:

(t) = t 1 (t); i+1

i (t) =

1  t m j=2 k

k (t) =

1  t m j=2

(1; j) i+1 (t)

+

m−i t i+1 (t) m

(1; j) (t) k

+

m−k +1 t k (t); m

(h; ‘) (t) i

=

i−1 m−i+1 t+ t m m

(h; ‘) (t) k

=

k −1 m−k +1 t+ t m m

(h; k) (t) k

=

k −1 m−k +1 t+ t k (t) m m

for i = 1; 2; : : : ; k − 1;

(h+1; ‘+1) (t) i+1 (h+1; ‘+1) (t) k

for 1 6 h ¡ ‘ 6 i ¡ k; for 1 6 h ¡ ‘ ¡ k;

for 1 6 h ¡ k:

(8)

Remark 2.1. The system of equations (7) in the case of k ¿ m + 1 coincides with the corresponding system of equations for the second k-match (5). The lists of the values of mean and variance of  for some speci2ed values of k and m are given in Table 3.

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

75

0.2 Time till the first two matches ( k =

0.18

3 , m = 5)

0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0

5

10

15

20

25

Fig. 1. The distribution of the waiting time for the 2rst two matches within a memory window of size k for (k = 3 and m = 5).

The values of probability functions can be obtained numerically and symbolically. For example, from Proposition 2.2 we can see the p.g.f. of the distribution for k = 3 and m = 5 as

(t) = −

1 t 3 (25 + 3t 3 − 3t 2 + 105t) : 5 −125 + 9t 3 + 15t 2 + 75t

By using Taylor series expansion of (t) with respect to t about 0, we obtain

(t) =

1 3 24 4 72 5 312 6 1368 7 5688 8 t + t + t + t + t + t 25 125 625 3125 15625 78125 +

427032 11 1800792 12 23976 9 101304 10 t + t + t + t 1953125 9765625 48828125 390625

+

7595208 13 32031288 14 135086616 15 t + t + t 244140625 1220703125 6103515625

+

569710584 16 2402673192 17 10132930872 18 t + t + t 30517578125 152587890625 762939453125

+

42734207448 19 180225473688 20 760075421256 21 t + t + t 3814697265625 19073486328125 95367431640625

+

3205510551864 22 13518787182552 23 t + t 476837158203125 2384185791015625

+

57013571994552 24 240446672498088 25 t + t + o(t 25 ): 11920928955078125 59604644775390625

Fig. 1 is a graph of the distribution of the waiting time for the 2rst two matches within a memory window of size k for k = 3 and m = 5.

76

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

3. Dependent sequences In this section, we consider the k-match problem when the observations are not necessarily independent or identically distributed. In order to treat the problem generally, we assume that the observations X−‘+1 ; : : : ; X0 ; X1 ; X2 ; : : : are {1; 2; : : : ; m}-valued ‘th order Markov dependent trials with initial probabilities x1 ;:::; x‘ = P(X−‘+1 = x1 ; : : : ; X0 = x‘ ) and transition probabilities px1 ;:::; x‘ ; x‘+1 = P(Xi = x‘+1 | Xi−1 = x‘ ; : : : ; Xi−‘ = x1 ); for x1 ; : : : ; x‘+1 = 1; 2; : : : ; m, where ‘ is a 2xed positive integer. For simplicity, we deal with only the case that k ¡ m. However, the other case can be treated similarly and the distribution is simpler than that of our case. We also assume that ‘ 6 k for symbolic simplicity. Of course, we can deal with the other case similarly. Let us study the distribution of the 2rst k-match in the sequence X1 ; X2 ; : : : ; (i.e., the subsequence {X−‘+1 ; : : : ; X0 } is used only for generating X1 ; X2 ; : : : as the initial condition and is not used for matching). Let (t) be the p.g.f. of the waiting time for the 2rst k-match and let (x1 ; :::; x‘ ) (t) be the p.g.f. of the conditional distribution of the waiting time given that X−‘+1 = x1 ; : : : ; X0 = x‘ . For i = 1; : : : ; k − 1, suppose that X−‘+1 ; : : : ; Xi have (x ; :::; xi ) just been observed and the 2rst k-match has not yet occurred. Then let (xi−‘+1 (t) 1 ; x2 ;:::; xi ) be the conditional p.g.f. of the waiting time from the ith point of time given that Xi−‘+1 = xi−‘+1 ; : : : ; Xi = xi and X1 = x1 ; X2 = x2 ; : : : ; Xi = xi . Suppose that we have observed until Xj (j ¿ k) and that the 2rst k-match has not yet occurred. Then we (x ; :::; xk ) (t) be the p.g.f. of the conditional p.g.f. of the waiting time from the let (xk−‘+1 1 ; x2 ;:::; xk ) jth point of time given that Xj = xk ; Xj−1 = xk−1 ; : : : ; Xj−k+1 = x1 . The conditional distribution does not depend on j clearly. By considering the condition of one-step ahead from every condition, we obtain the next proposition: Proposition 3.1. Under the above condition, the conditional p.g.f.’s satisfy the following recurrence relations:

(t) =

m 

x1 ;:::; x‘ (x1 ; :::; x‘ ) (t);

(9)

x1 ;:::; x‘ =1 m 

(x1 ; :::; x‘ ) (t) =

x‘+1 =1

2 ; :::; x‘+1 ) t (x (t) (x‘+1 )

(10)

for i = 1; 2; : : : ; k − 1, (x

; :::; x )

i (t)

(xi−‘+1 1 ; x2 ;:::; xi )  pxi−‘+1 ;:::; xi ;y t + =

y∈{x1 ;:::; xi }

 y∈{x1 ;:::; xi }

(x

; :::; x ; y)

i pxi−‘+1 ;:::; xi ;y t (xi−‘+2 (t) 1 ; x2 ;:::; xi ;y)

(11)

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

77

0.4 Waiting time till the first

0.35

k -match

0.3 0.25 0.2 0.15 0.1 0.05 0 0

2

4

6

8

10

12

14

Fig. 2. The conditional distribution of the waiting time for the 2rst k-match in Markov dependent trials.

and (x

; :::; x )

k (t)

(xk−‘+1 1 ;x2 ;:::; xk )  pxk−‘+1 ;:::; xk ;y t + =

y∈{x1 ;:::; xk }

 y∈{x1 ;:::; xk }

(x

; :::; xk ; y)

pxk−‘+1 ;:::; xk ;y t (xk−‘+2 2 ;:::; xk ;y)

(t):

(12)

Remark 3.1. k!( mk ) linear equations in (12) have just k!( mk ) unknown p.g.f.’s ; :::; xk ) (x

(xk−‘+1 (t) and hence the system of equations (12) can be solved. Then the recur1 ; x2 ;:::; xk ) rence relations (11) and (10) can be solved by using the solution of (12). Consequently

(t) can be obtained from (9). Even if the values of k; m and ‘ are not so small, we can solve the linear equations by using computer algebra systems. Fig. 2 is a graph of the probability function of the conditional distribution of the waiting time for the 2rst k-match (k = 3; m = 5) in a Markov chain X0 ; X1 ; : : : ; given X0 = 1 with the following transition probabilities: p1; 1 = 0:3; p1; 2 = 0:25; p1; 3 = 0:15; p1; 4 = 0:15; p1; 5 = 0:15; p2; 1 = 0:15; p2; 2 = 0:4; p2; 3 = 0:15; p2; 4 = 0:15; p2; 5 = 0:15; p3; 1 = 0:15; p3; 2 = 0:15; p3; 3 = 0:4; p3; 4 = 0:15; p3; 5 = 0:15; p4; 1 = 0:15; p4; 2 = 0:15; p4; 3 = 0:15; p4; 4 = 0:4; p4; 5 = 0:15; p5; 1 = 0:15; p5; 2 = 0:15; p5; 3 = 0:15; p5; 4 = 0:25; p5; 5 = 0:3: Example. A modi2cation of the example of customer loss. Assume m servers and customer arrivals at one-minute intervals and assume that service takes k min. Suppose that the 2rst customer chooses a server at random and that every customer except the 2rst customer can see which server was chosen by the previous customer. Thus, assume that he chooses at random a server except for the

78

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

server which was chosen by the previous customer. In this case, the generated sequence X1 ; X2 ; : : : becomes a Markov chain with initial probabilities P(X1 = i) =

1 m

for i = 1; 2; : : : ; m;

and transition probabilities P(Xi = y | Xi−1 = x) =

0

if x = y;

1 m−1

if x = y:

Let (t) be the p.g.f. of the waiting time for the 2rst k-match. For i = 1; 2; : : : ; k − 1, let i (t) be the p.g.f. of the waiting time for the 2rst k-match from the i-th point of time given that X1 ; X2 ; : : : ; Xi have just been observed and that the 2rst k-match has not yet occurred. Suppose that we have observed more than or equal to k X ’s and that k observations just before are diLerent from each other. Given the condition let k (t) be the p.g.f. of the conditional distribution of the waiting time for the 2rst k-match from the point of time. Then, from Proposition 3.1 and from the simplicity of the transition probabilities we have the following system of equations:

(t) = t 1 (t);

1 (t) = t 2 (t);

2 (t) =

1 m−2 t+ t 3 (t); m−1 m−1 .. .

k (t) =

k −1 m−k t+ t k (t): m−1 m−1

(13)

From the last equation of (13), we have

k (t) =

(k − 1)t : (m − 1) − (m − k)t

Then Eqs. (13) can be solved and we obtain

(t) =

m−2 2 m−2 m−3 3 1 t3 + t4 + t5 m−1 m−1 m−1 m−1 m−1 m−1 +··· + +

m−2 m−3 m − (k − 2) k − 2 k ··· t m−1 m−1 m−1 m−1

1 m−2 m−3 m − (k − 1) (k − 1) k+1 ··· t : m−1 m−1 m−1 m−1 1 − ((m − k)=(m − 1))t

(14)

K. Hirano, S. Aki / Journal of Statistical Planning and Inference 109 (2003) 67 – 79

79

References Aki, S., 1992. Waiting time problems for a sequence of discrete random variables. Ann. Inst. Statist. Math. 44, 363–378. Arnord, B.C., 1972. The waiting time until 2rst duplication. J. Appl. Probab. 9, 841–846. Burghardt, P.D., Godbole, A.P., Prengaman, A.B., 1994. A Poisson approximation for the number of k-matches. Statist. Probab. Lett. 21, 1–8. Ebneshahrashoob, M., Sobel, M., 1990. Sooner and later problems for Bernoulli trials: frequency and run quotas. Statist. Probab. Lett. 9, 5–11. Herzog, J., Mclaren, C., Godbole, A.P., 1998. Generalized k-matches. Statist. Probab. Lett. 33, 167–175. Hirano, K., Aki, S., 1999. Use of probability generating function for distribution theory of runs. Proc. Inst. Statist. Math. 47, 105–118. in Japanese Mase, S., 1992. Approximations to the birthday problem with unequal occurrence probabilities and their application to the surname problem in Japan. Ann. Inst. Statist. Math. 44, 479–499. Rao, C.R., Srivastava, R.C., Talwalker, S., Edgar, G.A., 1980. Characterization of probability distributions based on a generalized Rao-Rubin condition. SankhyOa A 42, 161–169.