Collapsing of non-homogeneous Markov chains

Collapsing of non-homogeneous Markov chains

Statistics and Probability Letters 84 (2014) 140–148 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: ...

374KB Sizes 0 Downloads 41 Views

Statistics and Probability Letters 84 (2014) 140–148

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Collapsing of non-homogeneous Markov chains Agnish Dey a , Arunava Mukherjea b,∗ a

Department of Mathematics, 1400 Stadium Road, Gainesville, FL 32611, United States

b

1216 Labrad Lane, Tampa, FL 33613, United States

article

abstract

info

Let X (n), n ≥ 0, be a (homogeneous) Markov chain with a finite state space S = {1, 2, . . . , m}. Let S be the union of disjoint sets S1 , S2 , . . . , Sk which form a partition of S. Define Y (n) = i if and only if X (n) ∈ Si for i = 1, 2, . . . , k. Is the collapsed chain Y (n) Markov? This problem was considered by Burke and Rosenblatt in 1958 and in this note this problem is studied when the X (n) chain is non-homogeneous and Markov. To the best of our knowledge, the results here are new. © 2013 Elsevier B.V. All rights reserved.

Article history: Received 30 August 2012 Received in revised form 18 September 2013 Accepted 2 October 2013 Available online 8 October 2013 Keywords: Non-homogeneous Markov chains Collapsing of Markov chains Reversibility Left invariant initial distribution

1. Introduction In this paper we study non-homogeneous Markov chains (NHMCs) X (n), n ≥ 0, with finite state space S = {1, 2, . . . , m}, an initial distribution p = (p1 , p2 , . . . , pm ), P (X (0) = i) = pi , and the transition probability matrices Pn , n ≥ 1, given by

(Pn )ij = P (X (n) = j|X (n − 1) = i), in the context of collapsibility. This is an old problem first studied by Burke and Rosenblatt (1958) for homogeneous Markov chains. The problem can be stated as follows. Let S1 , S2 , . . . , Sr be r, 1 ≤ r ≤ m, pairwise disjoint subsets of S each containing more than one state so that S = S1 ∪ S2 ∪ · · · ∪ Sr ∪ A, where A = S − ∪ri=1 Si . Then the partition of S, given by S1 , S2 , . . . , Sr , and the singletons in A defines a collapsed chain Y (n) given by Y (n) = i

if and only if X (n) ∈ Si ,

and Y (n) = u if and only if

X (n) = u,

where n ≥ 0, 1 ≤ i ≤ r, and u ∈ A. The problem we study here is when the collapsed chain Y (n) is Markov. In his book, Rosenblatt (1971), Chapter III, Section 2, Rosenblatt presented some motivating examples in this context. One simple model motivated by these examples can be described as follows. An experimenter leads a guinea pig into a maze consisting of a straight line path OA, forking at A into three different paths AB, AC, and AD; each path takes the guinea pig back to OA. A positive stimulus (an appealing food) is left on AB, while a negative stimulus (a mild electric shock) is left on AC, and another negative stimulus (a slightly less mild shock) on AD. After observing the guinea pig’s journey along this maze a large number of times, the experimenter decides on a 3-state



Corresponding author. E-mail address: [email protected] (A. Mukherjea).

0167-7152/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.spl.2013.10.002

A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

141

Markov chain model to describe the ‘‘learning’’ behavior of the guinea pig. To simplify the model even further, he now has the problem of deciding if collapsing the paths with negative stimuli can give him a 2-state Markov chain. Many papers on functions of Markov chains are available in the printed literature. Some of them are included in the references here. However, all such articles are on homogeneous Markov chains. Similar interesting and non-trivial questions can be asked for NHMCs. In this paper, we address some of them. While we are able to solve some of these problems, some interesting problems still remain. For example, how do the present results generalize in the case of a Markov process with state space either denumerable or continuous? Each of these references in the list: Abdel-Moniem and Leysieffer (1982), Glover and Mitro (1990), Iosifescu (1979), Kemeny and Snell (1960), Rogers and Pitman (1981), Rosenblatt (1973) and Rubino and Sericola (1989) is relevant in some respect with the present paper, though not discussed here directly. In Section 2, we discuss a few relevant examples and some necessary definitions. Section 3 is the main section, where our results are presented. 2. Definitions and examples Definition 1. The initial distribution vector p = (p1 , p2 , . . . , pm ), i=1 pi = 1, 0 ≤ pi ≤ 1 for each i, P (X (0) = i) = pi , where X (n) is a NHMC, is called left invariant if for each n ≥ 1, pPn = p. 

m

Note that for 1 ≤ i ≤ m, 1 ≤ j ≤ m,

(Pn )ij = P (X (n) = j|X (n − 1) = i),

n ≥ 1. (n)

Thus if p is left invariant, then for n ≥ 1, p = p, where p(n) (i) = P (X (n) = i). Notice that the uniform distribution vector 1 , m1 , . . . , m1 ) is left invariant if and only if each Pn is an m × m bi-stochastic matrix (that is, a matrix for which each row p=( m sum is 1 and each column sum is 1). Consider the m × m diagonal matrix D such that Dii = pi > 0, 1 ≤ i ≤ m. Definition 2. The NHMC X (n) is called reversible if and only if DPn = PnT D for each n ≥ 1.



If a NHMC X (n)n ≥ 0 is reversible, then its initial distribution p must be left invariant. The reason is the following. Reversibility implies that for all i, j, 1 ≤ i ≤ m, 1 ≤ j ≤ m, and n ≥ 1,

(DPn )ij = (PnT D)ij , or pi (Pn )ij = pj (Pn )ji . Thus, m 

pi (Pn )ij =

m 

pj (Pn )ji ,

j=1

j =1

pi = (pPn )i ,

1 ≤ i ≤ m,

or

implying p = pPn for n ≥ 1. Thus, left invariance is relevant when we deal with reversibility and recall that p(n) = p for n ≥ 1, if p is left invariant. We should also mention that it follows immediately that for a reversible NHMC X (n), P (X (n) = i, X (n − 1) = j) = P (X (n) = j, X (n − 1) = i) for n ≥ 1, 1 ≤ i ≤ m, 1 ≤ j ≤ m. The converse also holds. Let us also mention that there may not exist any left invariant distribution vector for a NHMC X (n). A very simple example is the following: suppose that for a particular 2-state NHMC,



1

Pn =  3 1





2

3  2

3

3

1

2

Pn =  3 2



3





3  1 3

for n odd;

for n even.

142

A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

Let p = (p1 , p2 ), 0 ≤ p1 ≤ 1, 0 ≤ p2 ≤ 1, p1 + p2 = 1, be such that pPn = p for all n ≥ 1. Then since limk→∞ Pnk = Pn for



1

1

n odd and limk→∞ Pnk =  2



1

2 for 1

2

2

n even, it follows that for n odd, p = limk→∞ pPnk = ( 13 , 23 ), and for n even, similarly,

p = ( 21 , 21 ). Thus p cannot exist. Let us now present a few examples of NHM chains. Non-homogeneous (or non-stationary) contexts appear when transition probabilities are time-dependent. A very simple example (very relevant in this paper, see our Theorem 4) of a NHMC W (n) is given by P (W (n) = j|W (n − 1) = i) = αn Pij + (1 − αn )Qij , where 0 ≤ αn ≤ 1 and P, Q are two stochastic matrices. Another simple example of a finite state NHMC results by taking simple modification of the classical Polya chain. Consider an urn with one white ball and one black ball. One ball is drawn at random, its color is noted and this ball along with a new ball of the same color is placed in the urn. The process is continued. The chain Z (n) is defined as follows: if i is the number of white balls drawn from the urn in the first n drawings, then for n ≥ 3,



Z ( n) =

i 3

if 0 ≤ i ≤ 3; if i > 3.

Then we take X (n) = Z (n + 3) for n ≥ 0. It follows easily that X (n), n ≥ 0, is a NHMC with state space {0, 1, 2, 3} such that pi = P (X (0) = i) =

1 4

,

i = 0, 1, 2, 3,

and if (Pn )ij = P (X (n) = j|X (n − 1) = i), then we have



n+3

n + 4    0  Pn =     0  0

1 n+4 n+2 n+4 0 0

 0 2 n+4 n+1 n+4 0

0

   0   .  3   n + 4 1

Note that the initial distribution vector p = ( 14 , 14 , 41 , 14 ) is not left invariant here (that is, pPn ̸= p for n > 0). However, NHMCs where the initial distribution vector p is left invariant are numerous. For example, where Pn is given by





an

bn

1 − an − b n

Pn =  cn en

dn

1 − cn − dn 

fn

1 − en − fn

such that Pn is stochastic, an + cn + 2en = 1, and bn + dn + 2fn = 1. Here, if p = ( 41 , 14 , 12 ), then pPn = p for n > 0. Other examples appear in the next section to justify various assumptions made in the results there. 3. Collapsibility of NHMCs : results In this section, we present four theorems and a number of examples. Let X (n), n ≥ 0, be a NHMC with state space S = {1, 2, . . . , m}. Theorem 1 provides a sufficient condition (that can easily be checked) for the collapsed chain Y (n), n ≥ 0, to be Markov for all possible initial distributions p of X (0). Example 1 shows that this condition is not necessary. Theorem 2 shows that when the X (n) chain is reversible with respect to p, the given initial distribution necessarily left invariant, the sufficient condition in Theorem 1 is also necessary. Theorem 3 is our main result. It introduces various other natural conditions for the Y (n) chain to be Markov. Theorem 4 characterizes the transition matrices Pn of X (n) in the case when Y (n) ≡ f (X (n)), where f is any given function from S into S, is Markov. Our last example shows that the reversibility condition cannot be removed in Theorem 4. Theorem 1. Let X (n), n ≥ 0, be a NHMC with any given initial distribution vector p and the state space S = {1, 2, . . . , m}. Let 1 ≤ r ≤ m, and S1 , S2 , . . . , Sr be r pairwise disjoint subsets of S such that S = ∪ri=1 Si . Let Y (n), n ≥ 0, be the collapsed chain defined by Y (n) = i if and only if X (n) ∈ Si . Then a sufficient condition for Y (n) to be Markov is that

(C) Pn (k, Sj ) ≡ P (X (n) ∈ Sj |X (n − 1) = k) is independent of k in Si for 1 ≤ i ≤ r, 1 ≤ j ≤ r, i ̸= j, n ≥ 2.



A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

143

Proof. Let i0 , i1 , . . . , in be any (n + 1) elements chosen from {1, 2, . . . , r }. Then we must show that P (Y (n) = in |Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (Y (n) = in |Y (n − 1) = in−1 ),

for n ≥ 2.

To show this, let us write P (Y (n) = in , Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (X (n) ∈ Sin , X (n − 1) ∈ Sin−1 , . . . , X (0) ∈ Si0 )



=

P (X (n) ∈ Sin |X (n − 1) = xn−1 ) × P (X (n − 1) = xn−1 , X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 )

xn−1 ∈Si

n−1



=

Pn (xn−1 , Sin ) × P (X (n − 1) = xn−1 , . . . , X (0) ∈ Si0 ).

xn−1 ∈Si

n−1

When condition (C) holds, we also have: Pn (xn−1 , Sin ) is independent of xn−1 in Sin−1 , even when in = in−1 , because then Pn (xn−1 , Sin ) = 1 − Sj ⊂S −Si P (xn−1 , Sj ). Thus we have n

P (Y (n) = in , Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = Pn (xn−1 , Sin ) × P (X (n − 1) ∈ Sin−1 , . . . , X (0) ∈ Si0 )

= Pn (xn−1 , Sin ) × P (Y (n − 1) = in−1 , . . . , Y (0) = i0 ), and this means that P (Y (n) = in |Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (Y (n) = in |Y (n − 1) = in−1 ), which is defined by the quantity Pn (xn−1 , Sin ) for any xn−1 in Sin−1 .



The following example shows that in Theorem 1, the sufficient condition may not be necessary even when Y (n) is Markov for all possible initial distributions p for a given NHMC X (n). Example 1. Let X (n), n ≥ 0, be a NHMC with state space S = {1, 2, 3}, with any given initial distribution p, such that for n ≥ 1, the second column of each Pn is a positive multiple of its third column, and furthermore, (Pn )21 ̸= (Pn )31 . Let S1 = {1}, S2 = {2, 3}, and let Y (n) be the chain with states 1 and 2 such that for n ≥ 0, Y (n) = 1 if and only if X (n) = 1, and Y (n) = 2 if and only if X (n) ∈ {2, 3}. Then Y (n) is Markov.  To show this, let i0 , i1 , . . . , in be any (n + 1) states from {1, 2}. We consider P (Y (n) = in , Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (X (n) ∈ Sin , X (n − 1) ∈ Sin−1 , . . . , X (0) ∈ Si0 ).

(1)

We need to show that the expression (1) is equal to P (X (n) ∈ Sin |X (n − 1) ∈ Sin−1 ) × P (X (n − 1) ∈ Sin−1 , . . . , X (0) ∈ Si0 ). If in−1 = 1 (that is, Sin−1 = {1}), this is immediate. Thus we consider the case when in−2 = 2 (that is, Sin−1 = {2, 3}). In this case, P (X (n) ∈ Sin , X (n − 1) ∈ {2, 3}, . . . , X (0) ∈ Si0 )

= P (X (n) ∈ Sin |X (n − 1) = 2) × P (X (n − 1) = 2, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) + P (X (n) ∈ Sin |X (n − 1) = 3) × P (X (n − 1) = 3, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) = (Pn )21 An + (Pn )31 Bn , when in = 1 or Sin = {1}; and = [1 − (Pn )21 ]An + [1 − (Pn )31 ]Bn when in = 2 or Sin = {2, 3}, where An = P (X (n − 1) = 2, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) and Bn = P (X (n − 1) = 3, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ). For Y (n) to be Markov, we must have

(Pn )21 An + (Pn )31 Bn (when in = 1) = P (X (n) ∈ Sin |X (n − 1) ∈ {2, 3}) × P (X (n − 1) ∈ {2, 3}, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) (Pn )21 A + (Pn )31 B × [An + Bn ], = A+B where we have taken A = P (X (n − 1) = 2), B = P (X (n − 1) = 3), suppressing n on the left sides for the moment. Simplifying the above equation (whether in = 1 or 2), we see that Y (n) will be Markov if and only if for n ≥ 2, we have   P (X (n − 1) = 2) An ((Pn )21 − (Pn )31 ) − = 0. (2) Bn P (X (n − 1) = 3) Notice that An Bn

=

P (X (n − 1) = 2, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) P (X (n − 1) = 3, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 )

.

144

A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

When Sin−2 = {1},

(Pn−1 )12 P (X (n − 2) = 1, . . . , X (0) ∈ Si0 ) Bn (Pn−1 )13 P (X (n − 2) = 1, . . . , X (0) ∈ Si0 ) (Pn−1 )12 = ≡ tn , say, (Pn−1 )13

An

=

and when Sin−2 = {2, 3},

[(Pn−1 )22 P (X (n − 2) = 2, . . . , X (0) ∈ Si0 ) + (Pn−1 )32 P (X (n − 2) = 3, . . . , X (0) ∈ Si0 )] Bn [(Pn−1 )23 P (X (n − 2) = 2, . . . , X (0) ∈ Si0 ) + (Pn−1 )33 P (X (n − 2) = 3, . . . , X (0) ∈ Si0 )] = tn (see above),

An

=

since the second column of Pn−1 is tn times its third column. Notice that this also means that P (X (n − 1) = 2) = P (X (n − 2) = 1)(Pn−1 )12 + P (X (n − 2) = 2)(Pn−1 )22 + P (X (n − 2) = 3)(Pn−1 )32

= tn [P (X (n − 2) = 1)(Pn−1 )13 + P (X (n − 2) = 2)(Pn−1 )23 + P (X (n − 2) = 3)(Pn−1 )33 ] = tn P (X (n − 1) = 3), P (X (n−1)=2)

so that tn = P (X (n−1)=3) (= AB ), for all possible distributions of X (n − 2). Thus, Eq. (2) holds even when (Pn )21 ̸= (Pn )31 , and thus, Y (n) is Markov. Here the sufficiency condition of Theorem 1 is not satisfied because (Pn )21 ̸= (Pn )31 .  The next theorem shows when the condition Pn (i, Sj ) = Pn (k, Sj ), 1 ≤ j ≤ r, for any i and k in Su , 1 ≤ u ≤ r, found sufficient in Theorem 1 for Y (n) to be Markov is also necessary. We will need to use reversibility in Theorem 2. An example following Theorem 2 shows that the reversibility condition is necessary in Theorem 2. We showed earlier, reversibility implies that the initial distribution p is left invariant. Thus, in Theorem 2, the vector p is necessarily left invariant. Theorem 2. Let X (n), n ≥ 0, be a reversible NHMC with state space S = {1, 2, . . . , m} having Pn = Pn+1 for each odd n, Si , 1 ≤ i ≤ r, be r pairwise disjoint subsets of S forming a partition of S so that S = ∪ri=1 Si . We form the collapsed chain Y (n), n ≥ 0, such that Y (n) = i if and only if X (n) ∈ Si , 1 ≤ i ≤ r. We assume that Y (n) is Markov and each pi > 0, 1 ≤ i ≤ r. Then the following is true: for any j, k, 1 ≤ j ≤ r, 1 ≤ k ≤ r and any n ≥ 1, Pn (i, Sj ) is independent of i in Sk .  Proof. Suppose that Y (n), n ≥ 0, is Markov. Let

(Qn )ij = P (Y (n) = j|Y (n − 1) = i) = P (X (n) ∈ Sj |X (n − 1) ∈ Si )  pk Pn (k, Sj ), = p(Si ) k∈S i

where p(Si ) = P (X (0) ∈ Si ) and Pn (k, Sj ) = P (X (n) ∈ Sj |X (n − 1) = k). Notice that if we define the m × r matrix B by Bkj = 1 if k ∈ Sj , = 0 otherwise, then we have

(Pn B)kj = Pn (k, Sj ).

(3)

If we define the r × m matrix A by Aik =

pk p(Si )

if k ∈ Si ; = 0 otherwise,

then we have : (APn B)ij = Qn = APn B,



k∈Si

(4)

Aik (Pn B)kj = (Qn )ij . Thus, we have

n ≥ 1,

(5)

and similarly, using the Markov property of Y (n), Qn Qn+1 = APn Pn+1 B,

n ≥ 1.

(6)

From (5) and (6), we have Vn = APn (I − BA)Pn+1 B = 0,

n ≥ 1,

(7)

where I is the m × m identity matrix and 0 on the right is the r × r zero matrix. Let C be the r × r diagonal matrix with Cii = p(Si ) = P (X (0) ∈ Si ), 1 ≤ i ≤ r. Then it follows from (4) that (CA)ik = pk if k ∈ Si ; = 0 otherwise so that CA = BT D. It follows from (7), after using the reversibility condition DPn = PnT D, that BT PnT [D(I − BA)]Pn+1 B = 0.

(8)

A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

145

Also, the m × m matrix D(I − BA) is positive semi-definite since it is clearly verified that for any non-zero m × m vector x, x[D(I − BA)]xT =

r  k =1

1



p(Sk ) i
pi pj (xi − xj )2 ≥ 0.

Thus using Higham (1990), we can write D(I − BA) = RT R for some m × m matrix R, and then it follows from (8) that for each n odd,

(RPn B)T (RPn B) = 0,

(9)

since by our assumption in the theorem, Pn = Pn+1 . This means that RPn B = 0, and therefore, RT RPn B = 0

or (I − BA)Pn B = 0,

which implies that Pn B = BQn .

(10)

It follows immediately from (3) that for any k ∈ Si , and 1 ≤ i ≤ r, 1 ≤ j ≤ r, Pn (k, Sj ) = (Qn )ij for all n ≥ 1.



Here we present an example of a NHMC X (n) and a corresponding collapsed chain Y (n) such that the sufficient condition in Theorem 1 for Y (n) to be Markov is shown to be not necessary even when the initial distribution p is uniform and left invariant. This example shows that reversibility condition cannot be removed from Theorem 2. Example 2. Let X (n), n ≥ 0, be a NHMC with 3 states such that S = {1, 2, 3}, p, the distribution of X (0), is equal to ( 13 , 13 , 13 ), and each Pn , n ≥ 1, is bi-stochastic and has the property

(Pn )21 ̸= (Pn )31 ,

(Pn )12 = (Pn )13 .

(11)

Notice that in this case, the initial distribution is left invariant (since pPn = p for n ≥ 1). Also by (11), Pn is not symmetric for n ≥ 1 and that implies that in this case, X (n) is not reversible. Let S1 = {1}, and S2 = {2, 3}. We define: for n ≥ 1, Y (n) = 1 if and only if X (n) = 1, and Y (n) = 2 if and only if X (n) ∈ {2, 3}. Clearly Pn (i, S1 ) = P (X (n) = 1|X (n − 1) = i) = (Pn )i1 is not independent of i because of (11). Thus the sufficient condition of Theorem 1 does not hold here. Let us show that Y (n) is Markov. To this end, let i0 , i1 , . . . , in be any (n + 1) states of Y (n). Then we must show that P (Y (n) = in |Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (Y (n) = in |Y (n − 1) = in−1 ).

(12)

Case 1: Suppose in−1 = 1. In this case, P (Y (n) = in , Y (n − 1) = 1, Y (n − 2) = in−2 , . . . , Y (0) = i0 ) = P (X (n) ∈ Sin , X (n − 1) = 1, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 )

= P (X (n) ∈ Sin |X (n − 1) = 1)P (X (n − 1) = 1, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) = P (Y (n) = in |Y (n − 1) = 1)P (Y (n − 1) = 1, Y (n − 2) = in−2 , . . . , Y (0) = i0 ), which is possible in this case because X (n) is Markov and Sin−1 is a singleton. Case 2: Suppose in−1 = 2, see that Sin−1 = {2, 3}. In this case, we write P (Y (n) = in , Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (X (n) ∈ Sin |X (n − 1) = 2)P (X (n − 1) = 2, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 )

+ P (X (n) ∈ Sin |X (n − 1) = 3)P (X (n − 1) = 3, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ). Let us write: An−1 = P (X (n − 1) = 2, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ), and Bn−1 = P (X (n − 1) = 3, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ). Thus if in = 1, then, P (Y (n) = 1, Y (n − 1) = 2, Y (n − 2) = in−2 , . . . , Y (0) = i0 ) = (Pn )21 An−1 + (Pn )31 Bn−1 . Also, P (Y (n) = 1|Y (n − 1) = 2) = P (X (n) = 1|X (n − 1) ∈ {2, 3}) = since P (X (n − 1) = 2) = P (X (n − 1) = 3) = in−1 = 2, if and only if 1 2

[(Pn )21 + (Pn )31 ] =

1 , 3

[(Pn )21 + (Pn )31 ] 13 1 3

+

1 3

=

1 2

[(Pn )21 + (Pn )31 ],

as the initial distribution is left invariant. Thus (12) holds when in = 1 and

(Pn )21 An−1 + (Pn )31 Bn−1 An−1 + Bn−1

or

((Pn )21 − (Pn )31 )(An−1 − Bn−1 ) = 0.

(13)

146

A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

Note that Eq. (13) was obtained above by assuming in = 1 and in−1 = 2. It is also clear that we would get the same equation even if we assumed that in = 2 and in−1 = 2. Thus, in order to establish (12) in this case when in−1 = 2, we must prove that for all n ≥ 2, An−1 = Bn−1 (since (Pn )21 ̸= (Pn )31 for all n ≥ 1). In other words we must prove that for n ≥ 2, P (X (n − 1) = 2, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) = P (X (n − 1) = 3, X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ).

(14)

When n = 2, (14) reduces to P (X (1) = 2, X (0) ∈ Si0 ) = P (X (1) = 3, X (0) ∈ Si0 ). When i0 = 2, this means that (P1 )22 + (P1 )32 = (P1 )23 + (P1 )33 . Since these last two equations hold as (Pn )12 = (Pn )13 , and Pn is bi-stochastic for n ≥ 1, it is clear that (14) holds when n = 2. The proof will be complete by induction argument once we show that (14) will follow for any n > 2, that is if we can prove that (14) follows if we assume that P (X (n − 2) = 2, X (n − 3) ∈ Sin−3 , . . . , X (0) ∈ Si0 ) = P (X (n − 2) = 3, X (n − 3) ∈ Sin−3 , . . . , X (0) ∈ Si0 ).

(15)

Thus, we assume that (15) holds and n > 2. We are done once we show that (15) implies (14). To this end, first suppose that in−2 = 1. In this case, (14) reduces to (Pn−1 )12 P (X (n − 2) = 1, X (n − 3) ∈ Sin−3 , . . . , X (0) ∈ Si0 ) = (Pn−1 )13 P (X (n − 2) = 1, X (n − 3) ∈ Sin−3 , . . . , X (0) ∈ Si0 ), which holds trivially since (Pn−1 )12 = (Pn−1 )13 for n > 1. In the case in−2 = 2, (14) reduces to P (X (n − 1) = 2, X (n − 2) = 2, . . . , X (0) ∈ Si0 ) + P (X (n − 1) = 2, X (n − 2) = 3, . . . , X (0) ∈ Si0 )

= P (X (n − 1) = 3, X (n − 2) = 2, . . . , X (0) ∈ Si0 ) + P (X (n − 1) = 3, X (n − 2) = 3, . . . , X (0) ∈ Si0 ). This last equation is equivalent to

(Pn−1 )22 P (X (n − 2) = 2, . . . , X (0) ∈ Si0 ) + (Pn−1 )23 P (X (n − 2) = 3, . . . , X (0) ∈ Si0 ) = (Pn−1 )32 P (X (n − 2) = 2, . . . , X (0) ∈ Si0 ) + (Pn−1 )33 P (X (n − 2) = 3, . . . , X (0) ∈ Si0 ). This last equation, of course, holds by our induction hypothesis (15), and the fact that (Pn−1 )22 + (Pn−1 )32 = (Pn−1 )23 + (Pn−1 )33 . Thus we have established (12), and consequently Y (n) is Markov, despite (Pn )21 ̸= (Pn )31 . Finally we remark that in this example, if the initial distribution is uniform and left invariant, and (Pn )21 ̸= (Pn )31 , then the same Y (n) chain is Markov if and only if (Pn )12 = (Pn )13 for all n ≥ 1. We have left out the details of the proof.  The next theorem is our main result. It is a general theorem presenting conditions for the collapsed chain Y (n) to be Markov. Theorem 3. Let X (n), n ≥ 0, be a NHMC with finite state space S. Let S1 , S2 , . . . , Sr be r, 1 ≤ r < m, |S | = m, pairwise disjoint subsets of S such that |Si | > 1 for each i, and S1 , S2 , . . . , Sr , and the singletons in A form a partition of S, where A = S − ∪ri=1 Si . The collapsed chain Y (n) is defined as follows: Y (n) = j if and only if X (n) ∈ Sj , 1 ≤ j ≤ r; Y (n) = u if and only if X (n) = u, for all n ≥ 0. Then the following results hold: (i) The condition Pn (k, Si ) ≡ P (X (n) ∈ Si |X (n − 1) = k) = 0,

k ̸∈ Si , 1 ≤ i ≤ r ,

(16)

is a sufficient condition for Y (n) to be Markov. This is however, not necessary for Y (n) to be Markov for any initial distribution p of X (0). (ii) Let v1 and v2 be any two elements from the state space of Y (n). If Y (n) is Markov, then we have for n ≥ 1 and given j, 1 ≤ j ≤ r,



P (X (n) = l|X (n − 1) ∈ Sv1 )P (X (n + 1) ∈ Sv2 |X (n) = l)

l∈Sj

= P (X (n) ∈ Sj |X (n − 1) ∈ Sv1 )P (X (n + 1) ∈ Sv2 |X (n) ∈ Sj ).

(17)

(iii) Let k be any state in S and v2 be as in (ii) above. Then the condition



P (X (n) = l|X (n − 1) = k)P (X (n + 1) ∈ Sv2 |X (n) = l)

l∈Sj

= P (X (n) ∈ Sj |X (n − 1) = k)P (X (n + 1) ∈ Sv2 |X (n) ∈ Sj ),

(18)

whenever 1 ≤ j ≤ r and n ≥ 1, is a sufficient condition for Y (n) to be Markov for all possible initial distributions p of X (0). In general, condition (18) is not necessary for Y (n) to be Markov for a given initial distribution p. However, it is necessary for Y (n) to be Markov for a given initial distribution p (necessarily left invariant) with respect to which the X (n) chain is reversible, assuming of course, for each n odd, Pn = Pn+1 .  Proof. (i) Assume that condition (16) holds. Let i0 , i1 , . . . , in be any (n + 1) states for Y (n). If in−1 is a singleton in A, it is easy to verify that P (Y (n) = in |Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (X (n) ∈ Sin |X (n − 1) = in−1 , X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 )

= P (X (n) ∈ Sin |X (n − 1) = in−1 ) = P (Y (n) = in |Y (n − 1) = in−1 ).

A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

147

If in−1 is not a singleton in A, in−1 = j, 1 ≤ j ≤ r. In this case, condition (16) implies that for P (Y (n − 1) = in−1 , Y (n − 2) = in−2 , . . . , Y (0) = i0 ) = P (X (n − 1) ∈ Sj , X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ) to be positive, it is necessary that i0 = i1 = · · · = in−1 = j. This means that P (Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (X (n − 1) ∈ Sj , . . . , X (1) ∈ Sj , X (0) ∈ Sj ) = P (X (n − 1) ∈ Sj , . . . , X (1) ∈ Sj ), since P (X (1) ∈ Sj |X (0) ̸∈ Sj ) = 0 by condition (16), when P (X (0) ̸∈ Sj ) > 0. Repeating this argument we have: P (X (n − 1) ∈ Sj , X (n − 2) ∈ Sj , . . . , X (0) ∈ Sj ) = P (X (n − 1) ∈ Sj ). Thus, it is clear that Y (n) is Markov in this case, because both sides P (Y (n)=j)

of (12) become P (Y (n−1)=j) . Example 1 shows that Y (n) in this example is Markov for all initial distributions when S = {1, 2, 3}, S1 = {1} and S2 = {2, 3}, and yet, for any n ≥ 1, Pn (1, {2, 3}) need not be zero. Thus condition (16) may not be necessary for Y (n) to be Markov. (ii) Suppose that Y (n) is Markov. Here each of Sv1 and Sv2 is either some Sk , 1 ≤ k ≤ r, or is one of the singletons in A. Then for a given j, 1 ≤ j ≤ r, we have for n ≥ 1: P (X (n) ∈ Sj |X (n − 1) ∈ Sv1 )P (X (n + 1) ∈ Sv2 |X (n) ∈ Sj )

= P (Y (n) = j|Y (n − 1) = v1 )P (Y (n + 1) = v2 |Y (n) = j) = P (Y (n) = j|Y (n − 1) = v1 )P (Y (n + 1) = v2 |Y (n) = j, Y (n − 1) = v1 ) = P (Y (n + 1) = v2 , Y (n) = j|Y (n − 1) = v1 )  P (X (n + 1) ∈ Sv2 , X (n) = l|X (n − 1) ∈ Sv1 ) = l∈Sj

=



P (X (n + 1) ∈ Sv2 |X (n) = l)P (X (n) = l|X (n − 1) ∈ Sv1 ).

l∈Sj

(iii) We assume condition (18). We show that Y (n) must be Markov for all possible initial distributions. Let i0 , i1 , . . . , in be any (n + 1) states for the Y (n) chain. Again, as before, when in−1 is a singleton in A, it is easy to check that P (Y (n) = in |Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (X (n) ∈ Sin |X (n − 1) = in−1 , X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 )

= P (X (n) ∈ Sin |X (n − 1) = in−1 ) = P (Y (n) = in |Y (n − 1) = in−1 ). When in−1 ̸∈ A, then in−1 = j for some j, 1 ≤ j ≤ r. In this case, we have P (Y (n) = in , Y (n − 1) = j, Y (n − 2) = in−2 , . . . , Y (0) = i0 )

= P (X (n) ∈ Sin , X (n − 1) ∈ Sj , X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 )   P (X (n) ∈ Sin |X (n − 1) = l)P (X (n − 1) = l|X (n − 2) = k) = k∈Si

n−2

l∈Sj

× P (X (n − 2) = k, X (n − 3) ∈ Sin−3 , . . . , X (0) ∈ Si0 )  = P (X (n) ∈ Sin |X (n − 1) ∈ Sj )P (X (n − 1) ∈ Sj |X (n − 2) = k) k∈Si

n−2

× P (X (n − 2) = k, X (n − 3) ∈ Sin−3 , . . . , X (0) ∈ Si0 ), where we have used condition (18) and the Markov property of X (n). Notice that the last expression above is equal to P (X (n) ∈ Sin |X (n − 1) ∈ Sj )P (X (n − 1) ∈ Sj , X (n − 2) ∈ Sin−2 , . . . , X (0) ∈ Si0 ). It follows that P (Y (n) = in |Y (n − 1) = in−1 , . . . , Y (0) = i0 ) = P (Y (n) = in |Y (n − 1) = in−1 ). Thus, the Y (n) chain is Markov. Let us now justify the final assertions in (iii). First, note that when X (n) is a reversible NHMC with respect to a given (left invariant) initial distribution, by Theorem 2, it follows that when Y (n) is Markov, the sufficient condition of Theorem 1 must then hold, and the sufficient condition immediately implies condition (18). Finally we remark that for the Markov chain X (n) considered in our Example 2, the Y (n) chain corresponding to the partition {1}, {2, 3} is Markov. However, it can easily be verified that condition (18) does not hold in this case. We omit the calculations.  Theorem 4. Let X (n), n ≥ 0, be a reversible Markov chain with finite state space S, p = (p1 , p2 , . . . , pn ) its initial distribution vector (with each pi > 0), for each n ≥ 1, n odd, Pn = Pn+1 . Then for any function f : S → S, the collapsed chain Y (n) ≡ f (X (n)) is Markov if and only if for each n ≥ 1,

(1) Pn = αn I + (1 − αn )U , where |αn | ≤ 1 (αn possibly negative), and U has all its rows identical and equal to p. Also, any NHMC having the form (1) is reversible. 

148

A. Dey, A. Mukherjea / Statistics and Probability Letters 84 (2014) 140–148

Proof. Suppose S has only two elements and Pn ̸= I, n ≥ 1. Choose αn = (Pn )11 − (Pn )21 = (Pn )22 − (Pn )12 . Define: Un such (Pn )21 (Pn )12 that (Un )11 = (Un )21 = (1−α , (Un )12 = (Un )22 = (1−α , when αn ̸= 1. Choose αn = 1 when Pn = I. Then for n ≥ 1, Pn = n) n) αn I + (1 − αn )Un , |αn | ≤ 1, where Un is a stochastic matrix with rank 1. Since X (n) is reversible (by assumption), p is left (1) (1) (1) invariant and as such, for each n ≥ 1, p = pPn = αn p + (1 − αn )pUn = αn p + (1 − αn )Un , or p = Un , where Un is the first row of Un . Thus, Un is independent of n, and for n ≥ 1, Pn = αn I + (1 − αn )U, |αn | ≤ 1, where each row of U is p. Now we assume that X (n) is reversible and S has three or more elements (that is m ≥ 3). In this case, whenever f is a function from S into S, by Theorem 2 the chain Y (n) ≡ f (X (n)) is Markov if for each n ≥ 1, and three distinct states i, j and k, (Pn )ik = (Pn )jk , and (Pn )ii + (Pn )ij = (Pn )ji + (Pn )jj , whenever f (i) = f (j), f −1 (f (k)) = {k}. In other words, for each n ≥ 1, each column of Pn has all of its non-diagonal entries equal, and furthermore, for any two columns of Pn (say, the ith and jth), the differences (Pn )ii − (Pn )ji and (Pn )jj − (Pn )ij are the same. Thus, as in the m = 2 case, we can again choose, αn = (Pn )ii − (Pn )ji (so that αn = 1 if and only if Pn = I), and write Pn = αn I + (1 − αn )Un , (Pn )

where |αn | ≤ 1, αn = (Pn )ii − (Pn )ji , and (Un )ii = (Un )ji = (1−αji ) for all i, j (i ̸= j) in S. Since pPn = p for each n ≥ 1, as in n the m = 2 case, each row of Un is equal to p, and Un = U (a rank one stochastic matrix where each row is equal to p). The ‘‘converse’’ part is very simple. The proof is omitted.  Our last example shows that the conclusion of Theorem 4 may not hold for a non-reversible NHMC X (n) with three states for which Y (n) ≡ f (X (n)) is Markov for any function f from the state space to itself. Example 3. Let us consider the NHMC X (n), n ≥ 1, where the initial distribution is uniform and Pn ’s are given by

1

1

1



1

1

1

 31 

3 1

3 1

 21 

3 1

6 1

P1 = U =  3 1

3 1

3

3

3 1 3

, P2 =  6 1

2 1

3

6



3 1

, and Pn = αn I + (1 − αn )U, n ≥ 3, where −21 < αn < 1 for all n. Note

2

that here the left invariant initial distribution vector is ( 13 , 13 , 31 ) and the reversibility condition fails for P2 since P2 is bistochastic, but not symmetric. However it can easily be verified that the collapsed chain Y (n) ≡ f (X (n)) is Markov for any function f from the state space of X (n) to itself.  Acknowledgments The authors are grateful to Professor Murray Rosenblatt for a number of useful comments. We are also grateful to the referees for pointing out many typos, suggesting rewriting of a number of statements which were not originally sufficiently clear, and finally helping us make this paper more readable. Let us also acknowledge some useful comments from Jim Fill, Goran Hognas and Laurent Miclo. Finally, let us mention that C.C. Lo had some unpublished work in the present context stated for some special Markov chains. References Abdel-Moniem, A.M., Leysieffer, F., 1982. Weak lumpability in finite Markov chains. J. Appl. Probab. 19, 685–691. Burke, C.J., Rosenblatt, M., 1958. A Markovian function of a Markov chain. Ann. Math. Stat. 29, 1112–1122. Glover, Joe, Mitro, Joanna, 1990. Symmetries and functions of Markov processes. Ann. Probab. 18, 655–658. Higham, N., 1990. Analysis of cholesky decomposition of a semi-definite matrix. Reliab. Numer. Comput. 161–185. Iosifescu, M., 1979. Finite Markov Processes and Their Applications. John Wiley and Sons. Kemeny, J.G., Snell, J.L., 1960. Finite Markov Chains. Springer-Verlag. Rogers, L.C.G, Pitman, J.W., 1981. Markov functions. Ann. Probab. 9, 573–582. Rosenblatt, M., 1971. Markov Process Structure and Asymptotic Behavior. Springer-Verlag. Rosenblatt, M., 1973. Random Processes, second edition. Springer-Verlag, New York. Rubino, G., Sericola, B., 1989. On weak lumpability in Markov chains. J. Appl. Probab. 26, 446–457.