Patterns generated by mth -order Markov chains

Patterns generated by mth -order Markov chains

Statistics and Probability Letters 80 (2010) 1157–1166 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage...

346KB Sizes 2 Downloads 180 Views

Statistics and Probability Letters 80 (2010) 1157–1166

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Patterns generated by mth-order Markov chains Evan Fisher a,∗,1 , Shiliang Cui b a

Department of Mathematics, Lafayette College, 111 Quad Drive, Easton, PA 18042, United States

b

OPIM Department, The Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104, United States

article

info

Article history: Received 1 February 2010 Accepted 15 March 2010 Available online 24 March 2010 Keywords: Patterns Markov chains Waiting time

abstract We derive an expression for the expected time for a pattern to appear in higher-order Markov chains with and without a starting sequence. This yields a result for directly calculating, the first time one of a collection of patterns appears, in addition to the probability, for each pattern, that it is the first to appear. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Let S be a non-empty finite set. A pattern from S is any finite sequence of elements from S. A pattern A of length n ∈ N = {1, 2, . . .} is denoted by A = a1 a2 . . . an where ai ∈ S for every i = 1, 2, . . . , n. In the case of a pattern generated by a sequence of random variables Z1 , Z2 , . . . , Zn , we denote the random pattern similarly as Z1 Z2 . . . Zn . (In the context of this paper, the aforementioned notation does not represent a product.) We introduce the following notation for typographical efficiency. Notation 1.1. Let A = a1 a2 . . . an be a pattern from S. For each m = 1, 2, . . . , n, let Am = a1 a2 . . . am and let A¯ m = an−m+1 . . . an . That is, Am represents the pattern consisting of the first m elements of A and A¯ m represents the pattern consisting of the last m elements of A. For k ≤ m ≤ n, we define the pattern Am,k by Am,k = am−k+1 . . . am . (In what follows, all indices are assumed to take values in N, unless explicitly indicated otherwise.) We adopt the same notation for random patterns Z1 . . . Zn with the exception that, to avoid ambiguity, we define the random pattern Z1 Z2 . . . Zm by Zm,m , rather than Zm . Let Z1 , Z2 , . . . be an mth-order, irreducible, homogeneous Markov chain defined on a probability space (Ω , F , P ) with finite state space S. That is, for every n ∈ N with n ≥ m and for s, s1 , . . . , sm ∈ S and x1 , x2 , . . . , xn−m ∈ S such that P (Zn = sm , . . . , Zn−m+1 = s1 , Zn−m = xn−m , . . . , Z1 = x1 ) > 0, then P Zn+1 = s | Zn = sm , . . . , Zn−m+1 = s1 , Zn−m = xn−m , . . . , Z1 = x1 = P Zn+1 = s | Zn = sm , . . . , Zn−m+1 = s1 .





(1.1) We denote the latter conditional probability by Ps1 ···sm ,s .



Corresponding author. E-mail addresses: [email protected] (E. Fisher), [email protected] (S. Cui).

1 Tel.: +1 610 330 5281; fax: +1 610 330 5721. 0167-7152/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2010.03.011

1158

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

Notation 1.2. Let µ be a probability distribution on the set of all patterns of length m from S and suppose that µ is the initial distribution of the Markov chain. That is, suppose that if A = a1 . . . am is a pattern from S, then µA = P (Z1 = a1 , . . . , Zm = am ). We denote the probability of events, C , associated with the Markov chain with initial distribution µ by Pµ (C ) and if µA = 1, then we denote Pµ C by PA (C ). Definition 1.1. Let B = b1 . . . bn be a pattern from S where n > m. The pattern B is said to be observable if n −m

µBm

Y

PBm+j−1,m ,bm+j > 0.

(1.2)

j=1

Let B = b1 . . . bn be an observable pattern and define the random variable T by T = inf{k ≥ n : Zk,n = B}.

(1.3)

Our main result, Theorem 3.1, describes a closed form expression for ET ; that is, the expected value of the first time, T , that the pattern B is generated by the Markov chain {Zi }i∈N . Theorem 3.1 generalizes the results of Li (1980) and Benevento (1984) for patterns generated by i.i.d. random variables and first-order Markov chains, respectively. Our proof is based partially upon a modification of the martingale construction used by Li (1980) (recent papers by Pozdnyakov (2008a,b) and Glaz et al. (2006) employ the same modification). However, when combined with the occupation measure approach by Benevento (1984), we obtain an efficient method for computing the expected time for a given pattern to appear in higher-order Markov chains, the content of Sections 2 and 3. This is illustrated by Example 3.1. This result is applied, in Section 4, to derive a closed form expression for the expected number of transitions from a starting pattern to the first appearance of a given target pattern (Theorem 4.1). This extends a result of Li (1980) for the i.i.d. case. In Section 5, we apply this result, using a technique employed in Gerber and Li (1981), to obtain the expected time at which a pattern from a finite collection of patterns first appears as well as the probability that each pattern in the collection is the first to appear. The result is a direct and efficient procedure for computing these and is an alternative approach to that in Pozdnyakov (2008a) and the generating function approach by Fu and Lou (2006) for computing the former. This is illustrated by Example 5.1. 2. Martingale construction (j)

For every j = 1, 2, . . . , m − 1 and for all t = 1, 2, . . ., define Mt (j)

t = 1, 2, . . . , j, define Mt

= 0.

= 0. Similarly, for every j = m, m + 1, . . ., and every

(j)

For j ≥ m and j < t ≤ j + n − m, define Mt by

(j)

Mt

=

 0        

t −j Y

if Zj,m 6= Bm ,

! −1 PBm+k−1,m ,bm+k

k=1

−1

if Zt ,m+t −j = Bm+t −j ,

(2.1)

if Zj,m = Bm and Zt ,m+t −j 6= Bm+t −j .

−1 (j)

For t > j + n − m, define Mt by (j)

Mt

= Mj(+j)n−m .

(2.2)

(j)

The process {Mt }t ∈N represents the following gambling game. For each j ≥ m, a gambler arrives at the game just after Zj has been observed. If the sequence of m observations ending with Zj does not match Bm , the gambler does not enter the game. Otherwise, the gambler enters the game betting one dollar that the next observation, Zj+1 , generates the next element of the target pattern, B. If that occurs, the gambler wins the amount (PBm ,bm+1 )−1 dollars and bets that amount on the outcome Zj+2 = bm+2 . If Zj+2 6= bm+2 , the gambler sustains a net loss of 1 dollar. If Zj+2 = bm+2 , the gambler wins the amount (PBm ,bm+1 PBm+1,m ,bm+2 )−1 and must bet that amount on the event that the next element generated continues the pattern B. This of play continues, until either B is subsequently generated, in which case the net gain of the gambler is fixed Qn−pattern m at ( k=1 Pbk ...bm+k−1 ,bm+k )−1 − 1 or a generated element fails to continue the pattern, in which case the gambler sustains a fixed net loss of 1. The game has been defined so that it is a fair game, and as noted in Pozdnyakov (2008a,b), the process {Mt(j) } is a martingale. This is the content of Proposition 2.1. (j)

Proposition 2.1. For each j = 1, 2, . . ., the process {Mt , Ft }t ∈N is a martingale where Ft = σ (Z1 , Z2 , . . . , Zt ), the sigma algebra generated by the set of random variables {Z1 , Z2 , . . . , Zt }. Proof. The result follows by direct calculation using Eqs. (2.1) and (2.2).



E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

P∞

For each t ∈ N, we define Xt by Xt = gain of all the gamblers after the tth play.

j=1

1159

(j)

Mt . In the context of the underlying gambling game, Xt represents the net

Proposition 2.2. The process {Xt , Ft }t ∈N is a martingale. Proof. The result follows immediately from Proposition 2.1.



From this point, our proof represents an adaptation and modification of that used in Benevento (1984). Let δ : S × S → R be the Kronecker delta function:

δ(x, y) =



if x = y, otherwise.

1 0

We extend this definition to patterns of length greater than or equal to one. If x = x1 x2 . . . xk and y = y1 y2 . . . yk are patterns from S, we define δ(x; y) by

δ(x; y) =

k Y

δ(xi , yi ).

(2.3)

i=1

That is, the function δ identifies whether the two patterns match. Define N by N =

T −1 X

1{Zt ,m =Bm } ,

(2.4)

t =m

where 1C is the indicator function of the event C and T is defined by Eq. (1.3). Then N represents the number of times prior to time T that the subpattern Bm appears and, therefore, the number of gamblers who have entered the game (that is, placed a bet) through time T − 1. It follows that j X Y

! −1

n−m

XT =

j =1

PBm+k−1,m ,bm+k

δ(Bm+j ; B¯ m+j ) − N .

(2.5)

k=1

(Note that δ(Bm+j ; B¯ m+j ) = 1 if and only if the first m + j and last m + j elements of B match.) We make use of the following result (see Williams (1991), p. 101). Lemma 2.1. Suppose that T is a stopping time such that P (T ≤ n + N |Fn ) > 

a.s.

for some N ∈ N, some  > 0, and for every n ∈ N. Then ET < ∞. Lemma 2.2. Let T be the stopping time for the pattern B = b1 . . . bn . Then ET < ∞. Proof. Since the Markov chain is irreducible and the number of states (and patterns of length m) is finite, there exists some K ∈ N such that for every pattern β of length m, there exists k(β) ∈ N with k(β) < K such that αβ = Pβ (Zk(β),m = Bm ) > 0. Let α = min{αβ : β is a pattern of length m}. Then α > 0. Let N = K + n. It follows from the Markov property that n−m

P (T ≤ k + N |Fk ) ≥ α

Y

PBm+j−1,m ,bm+j > 0

j =1

for all k ∈ N. By Lemma 2.1, we obtain ET < ∞.



Proposition 2.3. The process {Xt ∧T , Ft }t ∈N is a uniformly integrable martingale. Proof. It is a standard result (see Williams (1991), p. 99) that the process {Xt ∧T , Ft }t ∈N is a martingale. Let j X Y

! −1

n−m

c=

j =1

PBm+k−1,m ,bm+k

δ(Bm+j ; B¯ m+j ).

k=1

It is sufficient to show that {Xt ∧T } is bounded by an integrable random variable. Since |Xt ∧T | ≤ max{c , t ∧ T } for all t = 1, 2, . . ., and ET < ∞ (Lemma 2.2), then {Xt ∧T , Ft }t ∈N is uniformly integrable. 

1160

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

Proposition 2.4. Let N be as defined by (2.4). Then j X Y

! −1

n −m

EN =

j =1

δ(Bm+j ; B¯ m+j ).

PBm+k−1,m ,bm+k

(2.6)

k=1

Proof. Since X1 = 0, the result follows from (2.5) and Proposition 2.3.



Let N ∗ equal the number of times the initial segment Bm of B appears up to and including time T . Then N ∗ can be defined by N ∗ = N + δ(Bm ; B¯ m ).

(2.7)

Define W1 by W1 = min{k ≥ m|Zk,m = Bm }.

(2.8)

Define W2 by W2 = min{k > W1 |Zk,m = Bm }, and for j > 2 define Wj by Wj = min{k > Wj−1 |Zk,m = Bm }. That is, W1 , W2 , W3 , . . . are the successive hitting times for the initial segment Bm of B. Define W by W = min{k ≥ 1|Zm+k,m = Bm }.

(2.9)

If A = a1 a2 . . . am is a pattern of length m from S, then, consistent with Notation 1.2, we define EA W by EA W = E [W |Zm,m = Am ]. Given an initial pattern A, then EA W equals the expected number of additional transitions for the pattern Bm to appear. Since the Markov chain is irreducible and the state space is finite, it is a standard result from the theory of Markov chains (e.g. see Resnick (1992), pp. 119–120), that there exists a unique stationary distribution π and that

πB−m1 = EBm W < ∞.

(2.10)

Similarly, EB¯ m W < ∞. For each stopping time Wn , we define the σ -field FWn by

FWn = {A ∈ F : A ∩ {Wn = k} ∈ Fk for all k ∈ N} (see Durrett (2005), p. 285). Lemma 2.3. For each n ∈ N, the event (N ∗ ≥ n) ∈ FWn . Proof. For every n ∈ N, we note that (N ∗ ≥ n) = (Wn ≤ T ). If k, n ∈ N, then

(Wn ≤ T ) ∩ (Wn = k) = (T ≥ k) ∩ (Wn = k) ∈ σ (Z1 , . . . , Zk ). Therefore, (Wn ≤ T ) ∈ FWn .



3. Main theorem: proof and corollaries Theorem 3.1. Let Z1 , Z2 , . . . be an irreducible, mth-order Markov chain, with finite state space S. Suppose that B = b1 . . . bn is an observable pattern from S. Let T , N ∗ , W1 , and W be as defined by (1.3) and (2.7)–(2.9) respectively. Then ET = EW1 + (EN ∗ )πB−m1 − EB¯ m W

(3.1)

where j X Y

! −1

n −m



EN =

j=1

PBm+k−1,m ,bm+k

δ(Bm+j ; B¯ m+j ) + δ(Bm ; B¯ m ).

(3.2)

k=1

Proof. Since WN ∗ +1 = W1 + EWN ∗ +1 = EW1 + E

PN ∗

N∗ X k=1

k=1

(Wk+1 − Wk ), then

(Wk+1 − Wk ).

(3.3)

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

1161

It follows from Lemma 2.3, the strong Markov property (see Durrett (2005), p. 285), and (2.10) that E

N∗ ∞ X X (Wk+1 − Wk ) = E (Wk+1 − Wk )1{N ∗ ≥k} k=1

k=1

∞ X    = E E (Wk+1 − Wk )1{N ∗ ≥k} |FWk k =1

= (EN ∗ )πB−m1 .

(3.4)

Therefore, we obtain EWN ∗ +1 = EW1 + (EN ∗ )πB−m1 .

(3.5)

Since ZT ,m = B¯ m , the strong Markov property yields E (WN ∗ +1 − T ) = E [E (WN ∗ +1 − T | FT )]

= E [E (WN ∗ +1 − T | ZT ,m )] = EB¯ m W .

(3.6)

Eq. (3.1) follows from (3.3) and (3.5) and Eq. (3.2) follows from (2.7) and Proposition 2.4.



Example 3.1. Consider a second-order (m = 2) Markov chain on the state space S = {1, 2, 3} with transition probabilities defined by P11,1 = 1/5,

P11,3 = 4/5;

P13,1 = P13,2 = P13,3 = 1/3

P22,2 = 1/4,

P22,3 = 3/4;

P23,1 = 1/3,

P23,2 = 2/3

P31,1 = 1/4,

P31,3 = 3/4;

P32,2 = 1/3,

P32,3 = 2/3;

P33,1 = P33,2 = 2/5,

P33,3 = 1/5.

This chain is irreducible on the set of patterns {11, 13, 22, 23, 31, 32, 33}. Let µ be the initial distribution on this set of patterns defined by µ11 = 1/3 and µ31 = 2/3. Let B = 323 and let T = inf{k ≥ 3 : Zk,3 = B}. We apply Theorem 3.1 to calculate ET . Standard calculations determine the stationary distribution, π :

π = (15/307, 48/307, 32/307, 72/307, 48/307, 72/307, 20/307). In the context of Theorem 3.1, W1 = min{k ≥ 2|Zk,2 = B2 = 32} and W = min{k ≥ 1|Z2+k,2 = 32}. Using standard matrix techniques (see Resnick (1992), pp. 105–110), we obtain EB¯ 2 W = E23 W = 203/72. Given that the initial distribution, µ, is concentrated on the states {11, 31}, it follows that W = W1 + 2 and that EW1 = 119/16. It follows from Eq. (3.1) and −1 from EN ∗ = P32 ,3 = 3/2 that ET = 119/16 + (3/2)(307/72) − 203/72 ≈ 11.014. 3.1. Markov chains of order 1 In the case of a Markov chain Z1 , Z2 , . . ., of order one, target pattern B = b1 . . . bn for n ≥ 2, and stationary distribution π , the result of Theorem 3.1 becomes ET = EW1 + (EN ∗ )πb−11 − Ebn W

(3.7)

where j X Y

! −1

n−m

EN ∗ =

j =1

Pbk ,bk+1

δ(B1+j ; B¯ 1+j ) + δ(b1 ; bn ).

(3.8)

k=1

We note that this is equivalent to Theorem 3.1 in Benevento (1984), and in this form, the original result of Li (1980) for the case of i.i.d. random variables follows immediately. We state the result as Corollary 3.1. Corollary 3.1. Let Z1 , Z2 , . . ., be i.i.d. random variables with support S, where S is finite. For each s ∈ S, let πs = P (Zi = s) > 0. Let B = b1 . . . bn , where n ≥ 2, be a pattern from S and let T be as defined by (1.3). Then ET =

n X (πb1 πb2 . . . πbk )−1 δ(Bk ; B¯ k ). k =1

(3.9)

1162

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

4. Starting patterns and applications Let Z1 , Z2 , . . . be a first-order, irreducible, stationary Markov chain with finite state space S, transition probability matrix P, and unique stationary distribution π . For any pattern B = b1 b2 . . . bl from S where l ≥ 1, we define ΠB by

ΠB = P (Z1 = b1 , . . . , Zl = bl ) = πb1

l−1 Y

Pbi bi+1 .

i=1

That is, we take the initial distribution of the chain as π . For any two patterns A = a1 a2 . . . an and B = b1 b2 . . . bl we define, analogously to Li (1980), the notation A ? B by A?B=

l X

δ(Bj ; A¯ j )ΠB−j 1 .

(4.1)

j =1

In this equation, δ(Bj ; A¯ j ) identifies whether the initial subpattern of j elements of B matches the subpattern of the last j elements of A (see (2.3)). Let A = a1 a2 . . . an and B = b1 b2 . . . bl be patterns from S where B is observable (see Definition 1.1), but not a subpattern of A. Define N (A, B), the number of transitions from the last element of A until B is observed, by N (A, B) = min{k ≥ 1 : B is a connected subpattern of a1 a2 . . . an Z1 . . . Zk }.

(4.2)

A key characteristic describing the relationship between two patterns A and B is the maximum overlap of the head of B on the tail of A. We define ν(A, B) by

ν(A, B) = max{k : δ(Bk ; A¯ k ) = 1} where we set ν(A, B) ≡ 0 if {k : δ(Bk ; A¯ k ) = 1} = ∅.

(4.3)

A useful result in the sequel is that for any two patterns A and B, it follows that A ? B = A¯ ν(A,B) ? B = A¯ ν(A,B) ? Bν(A,B) .

(4.4)

The following notation will be useful. Notation 4.1. Let ν0 = ν(A, B). If 0 < ν0 < l, let ν1 = ν(B¯ ν0 , Bν0 ). For k ≥ 1, we define νk+1 by νk+1 = ν(Bνk−1 ,νk , Bνk ). Let K = min{k ≥ 0 : νk+1 ∈ {0, νk }}.

(4.5)

For the case 0 < ν0 < l, we have l > ν0 > ν1 > · · · > νK = νK +1 > 0

(4.6)

l > ν0 > ν1 > · · · > νK > νK +1 = 0.

(4.7)

or Theorem 4.1, which follows, provides a simple expression for EN (A, B), the expected value of N (A, B). It is analogous to Lemma 2.4 in Li (1980). Theorem 4.1. Let ν0 = ν(A, B). 1. If ν0 = l, then EN (A, B) = ΠB−1 . 2. If ν0 = 0, then EN (A, B) = B ? B + EN (an , b1 ) − EN (bl , b1 ). 3. Let K be as defined by (4.5) and suppose that 0 < ν0 < l. (a) If ν1 = 0, then EN (A, B) = B ? B − A ? B + EN (bl , b1 ) − EN (an , b1 ). (b) If K ≥ 1 and νK +1 = 0, then EN (A, B) = B ? B − A ? B + (−1)K [EN (an , b1 ) − EN (bνK , b1 )]. (c) If K ≥ 0 and νK +1 = νK , then EN (A, B) = B ? B − A ? B. Remark 1. In the case of an i.i.d. sequence of observations {Zk , k ≥ 1}, Theorem 4.1 reduces to Lemma 2.4 in Li (1980), which we state as Corollary 4.1. Corollary 4.1. Suppose that Z1 , Z2 , . . . are i.i.d. random variables with support a finite set S. Let A and B be patterns from S. If B is not a connected subsequence of A, then EN (A, B) = B ? B − A ? B. Proof. The result follows from Theorem 4.1 and the observation that, in this setting, EN (s, b1 ) = EN (t , b1 ) for all s ∈ S and t ∈ S. 

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

1163

4.1. Proof of Theorem 4.1, parts (1) and (2) Suppose that ν0 = l. Then N (A, B) = N (B, B) is the return time to B. Therefore, EN (A, B) = ΠB−1 .

(4.8)

Suppose that ν0 = 0. Define the probability distribution, µ, on S by µ(an ) ≡ 1 and let Z1 , Z2 , . . . have the initial distribution µ: that is, P (Z1 = an ) = 1. Let W = min{j ≥ 1 : Zj+1 = B1 = b1 }

(4.9)

and W1 = min{j ≥ 1 : Zj = b1 }.

(4.10)

Let T = inf{j ≥ 1 : Z1 Z2 . . . Zk = B}. Apply Theorem 3.1 (with m = 1) to obtain ET = Ean W1 + (EN ∗ )πb−11 − Ebl W .

(4.11)

Since ν0 = 0, then an 6= b1 , Ean W1 = 1 + Ean W1 = 1 + EN (an , b1 ), and N (A, B) = T − 1. By the definition of N, we have Ebl W = EN (bl , b1 ). Therefore EN (A, B) = (EN ∗ )πb−11 + EN (an , b1 ) − EN (bl , b1 ).

(4.12)

Theorem 4.1, part (1), follows from (3.8), (4.1), and the fact that A ? B = 0. 4.2. Proof of Theorem 4.1, part (3) We prove Theorem 4.1 for the case K = 0 and then by induction for K ≥ 1. Suppose that 1 ≤ ν0 ≤ l − 1. It follows that b1 . . . bν0 = an−ν0 +1 . . . an

(4.13)

b1 . . . bν0 +k 6= an−(ν0 +k)+1 . . . an

(4.14)

and for k ≥ 1. We apply Theorem 3.1 with m = ν0 . That is, consider the process {Zk ; k ≥ 1} as a Markov chain of order ν0 . We assume an initial distribution, µ defined by µA¯ ν = P (Zν0 = A¯ ν0 ) = 1. As in Theorem 3.1, we define W and W1 by 0

W = min{k ≥ 1 : Zk+1,ν0 = Bν0 }

(4.15)

W1 = min{k ≥ 1 : Zk,ν0 = Bν0 }.

(4.16)

and Note that N (A, B) = T − ν0 . It follows from Theorem 3.1 and EA¯ ν W1 = ν0 that 0

ET = EW1 + (EN ∗ )ΠB−ν1 − EB¯ ν W 0

0

= ν0 + (EN )ΠBν − EN (B¯ ν0 , Bν0 ) ∗

−1

(4.17)

0

and, from this, that EN (A, B) = (EN ∗ )ΠB−ν1 − EN (B¯ ν0 , Bν0 ).

(4.18)

0

By combining (3.8) with (4.1), it follows that if 1 ≤ ν0 ≤ l − 1, then EN (A, B) = B ? B − B ? Bν0 + δ(Bν0 , B¯ ν0 )ΠB−ν1 − EN (B¯ ν0 , Bν0 ). 0

(4.19)

4.2.1. The Case K = 0 Suppose that ν1 = ν(B¯ ν0 , Bν0 ) = ν0 . Then δ(Bν0 , B¯ ν0 ) = 1 and we obtain EN (B¯ ν0 , Bν0 ) = ΠB−ν1 by (4.8). 0 By the definition of ν (see (4.3)), the assumption that ν1 = ν0 , and by (4.4), it follows that B ? Bν0 = B¯ ν0 ? Bν0 = Bν0 ? Bν0 = A ? B. This yields the result EN (A, B) = B ? B − A ? B.

(4.20)

Now assume that ν1 = 0. By definition (see Notation 4.1), it follows that δ(Bν0 , B¯ ν0 ) = 1 and B ? Bν0 = B¯ ν0 ? Bν0 = 0. We apply Theorem 4.1 to B¯ ν0 and Bν0 , respectively, and obtain EN (B¯ ν0 , Bν0 ) = Bν0 ? Bν0 + EN (bl , b1 ) − EN (bν0 , b1 ). This result, the fact that Bν0 ? Bν0 = A ? B, and (4.19) establish part (2) of Theorem 4.1.

1164

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

4.2.2. The Case K = 1 The remainder of Theorem 4.1 is proved by induction on K for K ≥ 1. We first establish the result for K = 1, that is for the case l > ν0 > ν1 > 0 and either ν2 = ν1 or ν2 = 0. We apply equation (4.19) to B¯ ν0 and Bν0 respectively to obtain EN (B¯ ν0 , Bν0 ) = Bν0 ? Bν0 − Bν0 ? Bν1 + δ(Bν1 , Bν0 ,ν1 )ΠB−ν1 − EN (Bν0 ,ν1 , Bν1 ). 1

(4.21)

Since ν1 < ν0 , then δ(Bν0 , B¯ ν0 ) = 0 and (4.19) reduces to EN (A, B) = B ? B − B ? Bν0 − EN (B¯ ν0 , Bν0 ).

(4.22)

The definition of ν0 implies that Bν0 ? Bν0 = A ? B and

Bν0 ? Bν1 = A ? Bν1 .

(4.23)

The definition of ν1 implies that B ? Bν0 = B¯ ν0 ? Bν0 = B¯ ν1 ? Bν1 = Bν1 ? Bν1 .

(4.24)

It follows that EN (A, B) = B ? B − A ? B − (B ? Bν1 − A ? Bν1 ) − δ(Bν1 , Bν0 ,ν1 )ΠB−ν1 + EN (Bν0 ,ν1 , Bν1 ). 1

(4.25)

The definitions of ν1 and ν2 , with (4.4) yield A ? Bν1 = Bν0 ? Bν1 = Bν0 ,ν1 ? Bν1 = Bν2 ? Bν1

(4.26)

B ? Bν1 = B¯ ν0 ? Bν1 = Bν1 ? Bν1 .

(4.27)

and

Consider the case ν2 = ν1 . Then ν2 = ν(Bν0 ,ν1 , Bν1 ) = ν1 and δ(Bν1 , Bν0 ,ν1 ) = 1. By Theorem 4.1 (part 1), we obtain EN (Bν0 ,ν1 , Bν1 ) = ΠB−ν1 . It follows from the last equality in (4.27) that B ? Bν1 = Bν2 ? Bν1 and hence that EN (A, B) = B ? B − A ? B. 1

Now consider the case ν2 = 0. Then δ(Bν1 , Bν0 ,ν1 ) = 0. We apply the previously established result, Theorem 4.1 part (2), to the patterns Bν0 ,ν1 and Bν1 , respectively, to obtain EN (Bν0 ,ν1 , Bν1 ) = Bν1 ? Bν1 + EN (an , b1 ) − EN (bν1 , b1 ).

(4.28)

(Note that the last state in the pattern Bν0 ,ν1 equals an .) Since B ? Bν1 = Bν1 ? Bν1 (see (4.27)) and A ? Bν1 = Bν0 ? Bν1 = 0 (by (4.26) and the assumption that ν2 = 0), then Eqs. (4.25) and (4.28) yield EN (A, B) = B ? B − A ? B + EN (an , b1 ) − EN (bν1 , b1 ). This establishes Theorem 4.1 for the case K = 1. 4.2.3. The case K ≥ 2 Suppose that K ≥ 2. This means that l > ν0 > · · · νK −1 > νK > 0 and that νK +1 = νk or νK +1 = 0. Assuming the induction hypothesis and applying the result of Theorem 4.1 parts (3b) and (3c) to B¯ ν0 and Bν0 (where the corresponding index defined by (4.5) equals K − 1), we obtain EN (B¯ ν0 , Bν0 ) = Bν0 ? Bν0 − B¯ ν0 ? Bν0

(4.29)

if νK +1 = νK and EN (B¯ ν0 , Bν0 ) = Bν0 ? Bν0 − B¯ ν0 ? Bν0 + (−1)K −1 [EN (an , b1 ) − EN (bνK , b1 )] if νK +1 = 0. If νK +1 = νK , then Theorem 4.1, part (3c) follows from (4.22)–(4.24) and (4.29). If νK +1 = 0, then Theorem 4.1, part (3b), follows from (4.22)–(4.24) and (4.30). Hence, the induction proof is complete and Theorem 4.1 is established. 

(4.30)

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

1165

5. Application of Theorem 4.1 The result of Theorem 4.1 can be applied to solve the following problem: Given a finite state, irreducible, stationary first-order Markov chain {Zk }k≥1 , an initial pattern A, and a finite set of observable patterns A1 , A2 , . . . , An , each of which is not a subpattern of A, find the expected value for the number of observations, T , beyond A at which a pattern in the collection first appears. In addition, for each pattern Ai , find the probability, pi , that it is the first pattern to appear. The solution, which also applies in the case where there is no starting pattern, is a direct application of the results in Li (1980, Theorem 3.1), and Gerber and Li (1981, see pp. 102–103), which were derived for the case of i.i.d. sequences of observations. The proof in the context of this section is identical to that in Gerber and Li (1981). For the purpose of completeness, we state the result here: Theorem 5.1. Let Ti represent the time at which the pattern Ai first appears, for i = 1, 2, . . . , n. Let T = min{T1 , T2 , . . . , Tn }. For i 6= j and i, j = 1, 2, . . . , n, define eji by eji = EN (Aj , Ai ). Let eii = 0 and let ei = ETi for i = 1, 2, . . . , n. Then {ET , p1 , p2 , . . . , pn } satisfies the system of n + 1 linear equations described in matrix form by M (ET , p1 , . . . , pn )0 = (1, e1 , . . . , en )0 ,

(5.1)

 ... 1 . . . en1   . . . en2  . . . . . . . . . . . . . . . . . . . . . 1 e1n e2n . . . enn

(5.2)

where



0 1  M = 1

1 e11 e12

1 e21 e22

is invertible and v 0 represents the transpose of the row-vector v . Example 5.1. Let {Zj }j≥1 be a Markov chain with state space S = {1, 2, 3} and transition probability matrix P =

1/4 1/3 5/12

1/4 1/6 7/12

1/2 1/2 . 0

!

(5.3)

Let A be the pattern 113 and let A1 and A2 be the patterns 322 and 131 respectively. We apply Theorems 4.1 and 5.1 to calculate ET , the expected time until one of the patterns appear and the probability for each pattern that it is the first to appear. 5.1. Solution to Example 5.1 Since the transition probability matrix P is doubly stochastic, then the stationary distribution π is uniform on {1, 2, 3}. We first compute e1 . With the notation of Theorem 4.1, note that ν0 = 1 and ν1 = 0. Thus, part (3a) of Theorem 4.1 applies, so that e1 = EN (A, A1 )

= A1 ? A1 − A ? A1 + EN (2, 3) − EN (3, 3) = (π3 P32 P22 )−1 − π3−1 + EN (2, 3) − EN (3, 3).

(5.4)

Clearly, EN (3, 3) = π3 . Using a standard Markov chain calculation, such as a first step analysis, one obtains the result EN (2, 3) = 2. It follows that e1 = 188/7. Since e2 = N (A, A2 ), then ν0 = ν(113, 131) = 2, ν1 = ν(31, 13) = 1, and ν2 = ν(3, 1) = 0. Hence, part (3b) of Theorem 4.1 applies with K = 1. We obtain −1

e2 = EN (A, A2 )

= A2 ? A2 − A ? A2 + (−1)1 [EN (3, 1) − EN (1, 1)] = π1−1 + (π1 P13 P31 )−1 − (π1 P13 )−1 − [EN (3, 1) − 3].

(5.5)

As with the previous calculation, it is easily determined that EN (3, 1) = 34/13, and we conclude that e2 = 766/65. The computations for e12 and e21 are similar. Since ν0 = ν(A1 , A2 ) = 0, we apply part (1) of Theorem 4.1 and obtain e12 = 1116/65 and that e21 = 216/7. Thus, the vector (ET , p1 , p2 )0 is the solution of the matrix equation M −1 (1, e1 , e2 )0

(5.6)

1166

E. Fisher, S. Cui / Statistics and Probability Letters 80 (2010) 1157–1166

where M =

0 1 1

1 0 1116/65

1 216/7 . 0

!

We conclude that ET = 3728/607 ≈ 6.14, p1 = 399/1214 ≈ 0.329, and p2 = 815/1214 ≈ 0.671. Acknowledgement The second author’s research participation was funded by a Lafayette College Excel grant. References Benevento, R.V., 1984. The occurrence of sequence patterns in ergodic Markov chains. Stochastic Process. Appl. 17, 369–373. Durrett, R., 2005. Probability: Theory and Examples, third ed. Brooks/Cole, Belmont, California. Fu, J.C., Lou, W.Y.W., 2006. Waiting time distributions of simple and compound patterns in a sequence of r-th order Markov dependent multi-state trials. ˜ ˜ Ann. ˜Inst. Statist. Math 58, 291–310. Gerber, H.U., Li, S.Y.R., 1981. The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain. Stochastic Process. Appl. 11, 101–108. Glaz, J., Kulldorff, M., Pozdnyakov, V., Steele, J.M., 2006. Gambling teams and waiting times for patterns in two-state Markov chains. J. Appl. Probab. 43, 127–140. Li, S.Y.R., 1980. A martingale approach to the study of occurrence of sequence patterns in repeated experiments. Ann. Probab. 8, 1171–1176. Pozdnyakov, V., 2008a. On occurrence of patterns in Markov chains: method of gambling teams. Statist. Probab. Lett. 78, 2762–2767. Pozdnyakov, V., 2008b. On occurrence of subpattern and method of gambling teams. Ann. Inst. Statist. Math. 60, 193–203. Resnick, S., 1992. Adventures in Stochastic Processes. Birkhauser, Boston. Williams, D., 1991. Probability with Martingales. Cambridge University Press, Cambridge.