L1 -Poincaré inequality for discrete time Markov chains

L1 -Poincaré inequality for discrete time Markov chains

Statistics and Probability Letters 100 (2015) 93–97 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: w...

340KB Sizes 1 Downloads 61 Views

Statistics and Probability Letters 100 (2015) 93–97

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

L1 -Poincaré inequality for discrete time Markov chains✩ Lihua Zhang a,∗ , Yingzhe Wang b a

School of Sciences, Beijing University of Posts and Telecommunications, Beijing, 100876, China

b

School of Mathematical Sciences, Beijing Normal University, Beijing, 100875, China

article

abstract

info

We introduce a new constant by L1 -Poincaré inequality which lies between the classical L2 -Poincaré constant and Dobrushin coefficient. Meanwhile, the bounds for the L1 -Poincaré constant are obtained by using Cheeger’s technique. © 2015 Elsevier B.V. All rights reserved.

Article history: Received 25 March 2013 Received in revised form 16 January 2015 Accepted 18 January 2015 Available online 12 February 2015 MSC: 47J10 58C40 60J10 Keywords: Discrete time Markov chain Dobrushin coefficient L1 -Poincaré inequality L2 -Poincaré inequality

1. Introduction Consider a discrete time Markov chain {Xn : n ≥ 0} on a finite or countably infinite state space E. Let P be the transition probability matrix (or Markov Kernel) of Xn . And suppose that the Markov chain has a reversible probability µ. That is, P (x, y) ≥ 0,

for all x, y ∈ E ,

and



P (x, y) = 1,

for all x ∈ E .

y∈E

µ(x)P (x, y) = µ(y)P (y, x),

for any x, y ∈ E .

We assume throughout this article that P is irreducible. As we know that ergodicity has been one of the research focuses of Markov processes. Using functional inequalities is one of effective ways to study ergodicity of Markov process (Deng and Song, 2012; Wuebker, 2012). It is well known that the classical L2 -Poincaré inequality is equivalent to exponential convergence of associated Markov semigroups, see references Bakry (2002), Chen (1996), Chen and Wang (2000) and Chen (2004). Although the relationship between the L1 Poincaré inequality and the convergence rate of Markov chain is not clear, research about L1 -Poincaré inequality for Markov chain is still significant. Usually, the problem of inequality under the L1 norm is often translated into a L2 norm problem by using the Cauchy–Schwarz inequality (Diaconis, 2009; Saloff-Coste, 2004). Wang directly studied L1 -Poincaré inequality in Wang (2012) for continuous time Markov processes. However, the tools which are used in continuous time cases may not

✩ This work was supported by FRF for the Central Univ. (Grant No. BUPT2013RC0901) and 985 project.



Corresponding author. E-mail address: [email protected] (L. Zhang).

http://dx.doi.org/10.1016/j.spl.2015.01.020 0167-7152/© 2015 Elsevier B.V. All rights reserved.

94

L. Zhang, Y. Wang / Statistics and Probability Letters 100 (2015) 93–97

be useful in discrete time cases. Thus, exploring a new method to study L1 -Poincaré inequality for Markov chain in discrete time cases becomes important. In this article, we introduce a new constant by L1 -Poincaré inequality which lies between the classical L2 -Poincaré constant and Dobrushin coefficient, which is always used to describe strong ergodicity of discrete time Markov chain (Anderson, 1991). And we obtain the estimation of optimal constant in L1 -Poincaré inequality by using Cheeger’s technique and segmentation technique. The definition of Lk -Poincaré inequality is following: Definition 1.1. For k = 1, 2, we say that the Lk -Poincaré inequality of P holds if there exists a constant 0 < Ck < 1 such that for all f with µ(f ) = 0, the following inequality holds

µ(|Pf |k ) ≤ Ck µ(|f |k ). For k = 1, 2, we will denote by rk the optimal constant in Lk -Poincaré inequality, i.e.

 rk := sup

 µ(|Pf |k ) : µ( f ) = 0 . µ(|f |k )

(1)

Theorem 1.1. Assume that P is reversible, then r2 ≤ r1 ≤ δ(P ),

(2)

where δ(P ) is the Dobrushin coefficient:

δ(P ) =

1 2

sup x ,y



|P (x, z ) − P (y, z )|.

z

We can use r1 to describe the convergence of the semigroup P n in L1 (µ). Theorem 1.2. (a) r1 < 1 ⇒ µ(|P n f |) ≤ r1n µ(|f |), µ(f ) = 0, for all n > 1. (b) Let

 µ(|P m f |) : µ(f ) = 0 , r1 (P ) := sup µ(|f |) 

m

m ≥ 1.

n

Then r1 (P m ) < 1 ⇒ µ(|P n f |) ≤ [r1 (P m )][ m ] µ(|f |), µ(f ) = 0, for all n > m ≥ 1. Finally, we define the other two constants related with the L1 -Poincaré inequality.

γ0 (A) := sup{µ|Pf − µ(f )| : f |Ac = 0, µ(|f |) = 1}. By using Cheeger’s technique, we get upper bound and lower bound for r1 . Theorem 1.3. For the constants r1 and γ0 (A), we have 1 2

sup γ0 (A) ≤ r1 ≤ inf γ0 (A) ∨ γ0 (Ac ) .





(3)

A

A

Corollary 1.1. Assume that P is reversible, then 1 2

sup x



|P (x, y) − µ(y)| ≤ r1 ≤ inf

y

x



 |P (x, y) − µ(y)| ∨ γ ({x}c ) .

y

2. Proof of the results Proof of Theorem 1.1. In Theorem 1 in Zhang and Wang (2010), we proved that r2 is an eigenvalue of P. Now, we assume g is a corresponding eigenfunction of r2 , i.e. Pg = r2 g. Then, we have

µ(g ) = µ(Pg ) = µ(r2 g ) = r2 µ(g ). For r2 < 1, it follows that µ(g ) = 0. According to the definition of r1 , we have r1 ≥

µ(|Pg |) r2 µ(|g |) = = r2 . µ(|g |) µ(|g |)

So we prove the first part of inequality (2).

L. Zhang, Y. Wang / Statistics and Probability Letters 100 (2015) 93–97

For any f with µ(f ) = 0, µ(|f |) = 1, let A = {x : f (x) ≥ 0}, B = {x : Pf (x) > 0}. We obtain

      µ(|Pf |) = P (x, y)f (y) µ(x)    x y   = P (x, y)f (y)µ(x) − P (x, y)f (y)µ(x) =

x∈Bc

y

x∈B



µ(y)P (y, x)f (y) −

x∈Bc

y

x∈B

y



µ(y)P (y, x)f (y)

y

 =



 

µ(y)f (y)

y

P (y, x) −



 =





µ(y)f (y) 2

y

=2



P (y, x) − 1

x∈B



µ(y)f (y)

y

=2

P (y, x)

x∈Bc

x∈B



P (y, x)

x∈B



µ(y)f (y)



P (y, x) + 2

µ(y)f (y)

y∈Ac

x∈B

y∈A





P (y, x)

x∈B

=: I + II. Now, consider the first part I. Because µ(f ) = 0, we obtain I ≤ sup P (z , B) · 2 z ∈A



µ(y)f (y)

y∈A



 

= sup P (z , B) z ∈A

µ(y)f (y) +



y∈A

x∈A





µ(x)f (x) 

 = sup P (z , B) z ∈A

µ(y)f (y) −

y∈A

x∈Ac





µ(x)f (x) 

 = sup P (z , B) z ∈A

µ(y)f (y) +

= sup P (z , B) · z ∈A



µ(x)|f (x)|

x∈Ac

y∈A

µ(x)|f (x)| = sup P (y, B)µ(|f |). y

x∈E

Similar arguments apply to the part II, we get II ≤ −µ(|f |) inf P (y, B). y

Combining the above inequalities, we obtain

µ(|Pf |) ≤[sup P (y, B) − inf P (y, B)]µ(|f |) y

y

=

1 2

sup x ,y



|P (x, z ) − P (y, z )|µ(|f |).

z

Then the second part of inequality (2) is obtained.



Proof of Theorem 1.2. (a) Suppose the L1 -Poincaré inequality holds with some constant r1 < 1. We have

µ(|P n f |) = µ(|P (P n−1 f )|) ≤ µ(|P |P n−1 f ) ≤ r1 µ(|P n−1 f |) ≤ · · · ≤ r1n µ(|f |). Part (a) is proved. n (b) Suppose there exists d such that the positive integer n = m m + d. Then n

n

µ(|P n f |) = µ(|P [ m ]m+d f |) = µ(|P [ m ]m (P d f )|).

95

96

L. Zhang, Y. Wang / Statistics and Probability Letters 100 (2015) 93–97

According to the definition of r1 (P m ), and by using the results in part (a), we have n

n

n

µ(|P [ m ]m (P d f )|) ≤ [r1 (P m )][ m ] µ(|P d f |) ≤ [r1 (P m )][ m ] µ(|f |).  Proof of Theorem 1.3. For ε > 0, choose fε satisfying µ(fε ) = 0, µ(|fε |) = 1, and r1 − ε ≤ µ(|Pfε |). For any A ⊂ E, we have r1 − ε ≤ µ(|Pfε |) = µ(|P (fε (1A + 1Ac ))|)

  = µ |P (fε (1A + 1Ac )) − µ(fε (1A + 1Ac ))|   = µ |P (fε 1A ) − µ(fε 1A ) + P (fε 1Ac ) − µ(fε 1Ac )|     ≤ µ |P (fε 1A ) − µ(fε 1A )| + µ |P (fε 1Ac ) − µ(fε 1Ac )| ≤ γ0 (A)µ(|fε 1A |) + γ0 (Ac )µ(|fε 1Ac |) ≤ (γ0 (A) ∨ γ0 (Ac )). Since ε is arbitrary, we have r1 ≤ (γ0 (A) ∨ γ0 (Ac )). (b) For any set A ⊂ E and ε > 0, choose fε so that fε |Ac = 0, µ(|fε |) = 1, and γ0 (A) − ε ≤ µ(|Pfε − µ(fε )|). Then

γ0 (A) − ε ≤ µ(|Pfε − µ(fε )|) = µ(|P (fε − µ(fε ))|) ≤ r1 µ(|fε − µ(fε )|) ≤ 2r1 µ(|fε |) = 2r1 . Since ε is arbitrary, the left part of inequality (3) is also proved.



 

Proof of Corollary 1.1. Especially, we take A = {x} in Theorem 1.3. Then f 

{x}c

γ0 ({x}) = µ(|Pf − µ(f )|) =



= 0 and f (x) = 1/µ(x).

µ(y)|P (f (y)) − µ(f )|

y

=



µ(y)|P (y, x)f (x) − 1|

y

=



=



y

    1  − 1 µ(y) P (y, x) µ(x) |P (x, y) − µ(y)|.

y

By using the conclusion in Theorem 1.3, Corollary 1.1 can be proved.



3. Examples We now give two examples. Example 3.1. Consider a chain on two states {0} and {1} with transition matrix

α P= 1−β 

1−α



β

with 0 < α, β < 1. Then r1 = r2 = δ(P ) = α + β − 1. From the above example, we can see that the result in Theorem 1.1 is sharp to some degree. Next, we give a lower bound for r1 which helps to give an example such that r1 > r2 . Lemma 3.1. For any set A ⊂ E, denote BA = {x ∈ E : P (x, A) ≥ µ(A)}. Then

 r1 ≥ sup A⊂E

µ(x)P (x, A) − µ(A)µ(BA )

x∈BA

µ(A)µ(Ac )

.

(4)

Proof of Lemma 3.1. For A ⊂ E, take f = 1A − µ(A). Obviously, we have µ(f ) = 0. We first compute µ(|f |).

µ(|f |) = µ(|1A − µ(A)|) =

 x

|1A (x) − µ(A)|µ(x) = 2µ(A)µ(Ac ).

(5)

L. Zhang, Y. Wang / Statistics and Probability Letters 100 (2015) 93–97

We proceed to deal with µ(|Pf |). Since Pf (x) = P (1A − µ(A))(x) = We have

µ(|Pf |) = µ(|P (1A − µ(A))|) =

x

=

µ(x)

x∈BA

=



P (x, y) − µ(A)

+





µ(x) µ(A) −

x∈BcA

y∈A



P (x, y) − µ(A) and BA = {x ∈ E : P (x, A) ≥ µ(A)}.

y∈A

 

y∈A

97

     P (x, y) − µ(A)µ(x) 

 



µ(x)P (x, y) −





P (x, y)

y∈A

µ(x)P (x, y) − µ(A)µ(BA ) + µ(A)µ(BcA ).

x∈BcA y∈A

x∈BA y∈A

Because µ satisfies µ(x)P (x, y) = µ(y)P (y, x) and µ(BcA ) = 1 − µ(BA ), the above equality becomes

µ(|Pf |) =



µ(y)P (y, x) −



µ(y)P (y, x) − µ(A)(2µ(BA ) − 1)

x∈BcA y∈A

x∈BA y∈A

      P (y, x) − = P (y, x) µ(y) − µ(A)(2µ(BA ) − 1) y∈A

x∈BcA

x∈BA



 =



2

y∈A

=2





P (y, x) − 1 µ(y) − µ(A)(2µ(BA ) − 1)

x∈BA

µ(x)P (x, A) − 2µ(A)µ(BA ).

x∈BA

Combining (5) with the above equality, we obtain

 µ(|P (1A − µ(A))|) r1 ≥ = µ(|1A − µ(A)|)

µ(x)P (x, A) − µ(A)µ(BA )

x∈BA

µ(A)µ(Ac )

.

Taking the supremum over A ⊂ E on the right-hand side yields (4).



By using Lemma 3.1, and taking A = {0, 1}, we have the following example: Example 3.2. Consider a chain on the states E = {0, 1, 2, . . .} with transition matrix

 1 − q,    q, P (i, j) = p,   r ,  q,

i i j j j

= 0, j = 0 = 0, j = 1 = i − 1, i = 1, 2, . . . = i, i = 1, 2, . . . = i + 1, i = 1, 2, . . .

with 0 < q < p < 1, p + r + q = 1. Then



r1 = 1 > r2 = 1 − ( p −



q)2 .

References Anderson, W.J., 1991. Continuous-time Markov Chains. Springer-Verlag. Bakry, D., 2002. Functional Inequalities for Markov Semigroup. Note de cours du Tata insititute, Bombay. Chen, M.F., 1996. Estimation of spectral gap for Markov chains. Acta Math. Sin. New Ser. 12 (4), 337–360. Chen, M.F., Wang, F.Y., 2000. Cheeger’s inequalities for general symmetric forms and exitence criteria for spectral gap. Ann. Probab. 28, 235–257. Chen, M.F., 2004. Eigenvalues, Inequalities, and Ergodic Theory. Springer, London. Deng, C.S., Song, Y.H., 2012. Weak log-Sobolev and Lp weak Poincaré inequalities for general symmetric forms. Front. Math. China. 7 (6), 1059–1071. Diaconis, P., 2009. Threads through group theory. Character theory of finite groups. Contem. Math. 524, 33–45. Saloff-Coste, L., 2004. Random walks on finite groups, probability on discrete structures. In: Encyclopaedia Math. Sci., vol. 110. Springer, Berlin, pp. 263–346. Wang, F.Y., 2012. On L1 -Log Sobolev inequalities and L1 -Poincaré inequalities. Oral communication. Wuebker, A., 2012. Spectral theory for weakly reversible Markov chains. J. Appl. Probab. 49 (1), 245–265. Zhang, L.H., Wang, Y.Z., 2010. L2 -Geometric convergence for time discrete Markov processes. Math. Appl. 23 (2), 340–344.