Statistics & Probability Letters 56 (2002) 199–206
A note on various holding probabilities for random lazy random walks on &nite groups Martin Hildebrand Department of Mathematics and Statistics, State University of New York — University at Albany, Albany, NY 12222, USA Received June 2001; received in revised form September 2001
Abstract The author previously considered certain lazy random walks on arbitrary &nite groups. Given a k-tuple (g1 ; : : : ; gk ) of elements of a &nite group, one multiplies the previous position of the walk by gi where i is uniform on {1; : : : ; k} and has a given distribution on {1; 0; −1}. The previous work gave good bounds if P( = 1) = P( = −1) = 1=4 and P( = 0) = 1=2 or if P( = 1) = P( = 0) = 1=2. The current paper develops some elementary comparison techniques which work for other distributions for such as P( = 1) = P( = c 2002 Elsevier Science B.V. All rights reserved. 0) = P( = −1) = 1=3.
1. Introduction In Hildebrand (2001), the author considered certain “lazy” random walks on arbitrary &nite groups. Suppose (g1 ; : : : ; gk ) is a k-tuple of elements in a &nite group G of order n. De&ne Psym (s)=Pr(s=gia ) and Qsym (s) = Pr(s = gib ) where i is uniform on {1; : : : ; k}, a is uniform on {−1; 0; 1}, Pr(b = −1) = Pr(b = 1) = 1=4, Pr(b = 0) = 1=2, i and a are independent, and i and b are independent. Let P ∗m (s) = Pr(X1 X2 : : : Xm = s) where X1 ; : : : ; Xm are i.i.d. random variables with Pr(Xi = s) = P(s). Let dP (m) be the variation distance of P ∗m from the uniform distribution, i.e. 1 ∗m 1 : dP (m) = P (s) − 2 s ∈G |G|
E-mail address:
[email protected] (M. Hildebrand). c 2002 Elsevier Science B.V. All rights reserved. 0167-7152/02/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 1 ) 0 0 1 8 8 - 2
200
M. Hildebrand / Statistics & Probability Letters 56 (2002) 199–206
In Hildebrand (2001), the author built upon results of ErdBos and RCenyi (1965) and Pak (1999) to show Theorem 1. Suppose k = log2 n + f(n) where f(n)→∞ as n→∞. Let ¿0 be given. If m = m(n) ¿ (1 + )(log2 n) ln(log2 n); then E[dQsym (m)] → 0 as n → ∞ where the expectation is over a uniform choice of the nk possible k-tuples (g1 ; : : : ; gk ). By using a comparison theorem of Ross and Xu (1993), the author (Hildebrand, 2001) also showed Theorem 2. Suppose k = log2 n + f(n) where f(n)→∞ as n→∞. For some constant c ¿ 0; if m = m(n) ¿ c(log n)2 log(log n); then E[dPsym (m)] → 0 as n → ∞ where the expectation is as in Theorem 1. Looking at the transition from Theorem 1 to Theorem 2, the reader may wonder about the extra factor of log n. While issues such as parity concerns might cause an increase in the bound on m, the extra factor of log n seems large since the “holding probability” in Theorem 2 is still 1=3. Theorem 4 below will eliminate this extra factor of log n. The proof of this theorem will use a specialized comparison theorem developed in this paper. This theorem can also be used in cases where the exponent has an asymmetric distribution on {−1; 0; 1}. Diaconis and SaloJ-Coste (1993) have developed some comparison techniques for reversible Markov chains. These techniques involve eigenvalue arguments. If the exponent of gi has an asymmetric distribution on {−1; 0; 1}, then the corresponding Markov chain may not be reversible. Furthermore, while one can easily relate the eigenvalues of the transition matrices corresponding to Psym and Qsym , considerable information may be lost in trying to go between the variation distances dPsym and dQsym by merely using this eigenvalue relation and bounds involving eigenvalues. The reader familiar with Hildebrand (2001) or with Pak (1999) may wish to consider ex pressions of the form gi11 gi22 · · · gijj where i1 ; : : : ; ij are i.i.d. uniform on {1; : : : ; k} and 1 ; : : : ; j are i.i.d. In the case where i is uniform on {−1; 0; 1}, one may wish to “pull through” (as in Pak (1999) or Hildebrand (2001)) terms gi‘‘ where the ‘ is an “excessive” −1 or 1 compared with the case P(i = −1) = P(i = 1) = 1=4 and P(i = 0) = 1=2. This method, however, encounters diMculty when an “excessive” ‘ corresponds to a value i‘ ∈ {i1 ; : : : ; i‘−1 }, i.e. when the &rst time the i‘ th coordinate appears. In this case, “pulling through” might create terms such as g2 and g1g1 g2 (where gx := xgx−1 ) which we would need to be independent and uniform on G (over the nk choices of the k-tuple (g1 ; : : : ; gk )) to make the “pulling through” argument to work but, unlike expressions like g1 and g2g1 , need not be so. If one tries to avoid this diMculty by pulling through all terms ginn with in = i‘ , then one loses too many elements from the k-tuple (g1 ; : : : ; gk ). For the comparison theorem we use, let P be a doubly stochastic matrix. Let Pa = aI + (1 − a)P if 0 6 a ¡ 1. Let S be the set of states of a Markov chain, and assume the initial state s0 is &xed. Let P m (s) be the probability that the Markov chain is in state s after m steps of the Markov chain.
M. Hildebrand / Statistics & Probability Letters 56 (2002) 199–206
201
Recall the variation distance 1 m 1 m : P − U := P (s) − 2 s ∈S |S| The comparison theorem we use follows. Theorem 3. Suppose n and m are positive integers and 0 ¡ b ¡ a ¡ 1. Suppose n1 ; n2 ; r; and R are positive integers such that n2 ¿ n1 ; n1 + R 6 m; n2 + R 6 n; and n1 − r ¿ 0. Then R n m Pb − U 6 Pa − U + |Pr(X1 = n1 + j) − Pr(X2 = n2 + j)| j=−r
+ Pr(X1 ¡ n1 − r) + Pr(X2 ¡ n2 − r) + Pr(X1 ¿ n1 + R) + Pr(X2 ¿ n2 + R); where X1 is a binomial random variable with parameters m and 1 − a and X2 is a binomial random variable with parameters n and 1 − b. One of the conclusions we get from this theorem is Theorem 4. Suppose k = log2 n + f(n) where f(n) → ∞ as n → ∞. Let ¿ 0 be given. If m = m(n) ¿ (1 + )(9=8)(log2 n) ln(log2 n); then E[dPsym (m)] → 0 where the expectation is as in Theorem 1.
2. Proof of Theorem 3 To prove this theorem, we will use the following proposition which has an elementary proof left to the reader. Proposition 5. Suppose Q and P are doubly stochastic matrices. Assume that Markov chains with transition matrices PQ and P both start in state s0 . Then PQ − U 6 P − U : Proposition 5 implies Pam P n2 −n1 − U 6 Pam − U . Observethat by the triangle inequality, Pbn − U 6 Pam P n2 −n1 − U + Pbn − Pam P n2 −n1 where P − Q:= 12 s∈S |P(s) − Q(s)| for Markov chains with transition matrices P and Q and &xed initial state s0 . To examine Pbn −Pam P n2 −n1 , we shall use a coupling idea described in Aldous (1983). If Z1 and Z2 are random variables with probabilities Q1 and Q2 , then Q1 − Q2 6 Pr(Z1 = Z2 ). We shall construct random variables Z1 and Z2 with probabilities Pam P n2 −n1 and Pbn , respectively. Let X1 and X2 be the binomial random variables described in
202
M. Hildebrand / Statistics & Probability Letters 56 (2002) 199–206
Theorem 3. For each s ∈ S, with probability R
min(Pr(X1 = n1 + j); Pr(X2 = n2 + j))P n2 +j (s);
j=−r
let Z1 = Z2 = s. Otherwise, we shall de&ne Z1 and Z2 separately. With additional probability Pr(X1 = n1 + j)P n2 +j (s) + Pr(X1 = n1 + j)P n2 +j (s) j¡−r
j¿R R
+
max(0; Pr(X1 = n1 + j) − Pr(X2 = n2 + j))P n2 +j (s);
j=−r
let Z1 = s, and with additional probability Pr(X2 = n2 + j)P n2 +j (s) + Pr(X2 = n2 + j)P n2 +j (s) j¡−r
+
j¿R R
max(0; Pr(X2 = n2 + j) − Pr(X1 = n1 + j))P n2 +j (s);
j=−r
let Z2 = s. Observe that Pam (s) = mj=0 Pr(X1 = j)P j (s) and Pbn (s) = nj=0 Pr(X2 = j)P j (s). Thus Z1 and Z2 have the desired distributions, and P(Z1 = Z2 ) 6
R
|Pr(X1 = n1 + j) − Pr(X2 = n2 + j)| + Pr(X1 ¡ n1 − r) + Pr(X2 ¡ n2 − r)
j=−r
+ Pr(X1 ¿ n1 + R) + Pr(X2 ¿ n2 + R):
3. Where variances of X1 and X2 are close to equal To use Theorem 3, it is helpful to choose values of m, n, n1 , n2 , r and R so that the terms involving probabilities of X1 and X2 are small. In particular, if n and m are related so that the variances of X1 and X2 are equal or very close to equal and n1 and n2 are approximately the expected values of X1 and X2 , respectively, then those terms (with appropriate choice of r and R) will be small for &xed a and b and large n and m. Theorem 6 formalizes these ideas. Theorem 6. Suppose 0 ¡ b ¡ a ¡ 1 and a and b are constants. Suppose a(1 − a) n= m ; b(1 − b)
M. Hildebrand / Statistics & Probability Letters 56 (2002) 199–206
n1 = m(1 − a) + 1 with −a ¡ 1 6 1 − a; n2 = n(1 − b) + 2 with −b ¡ n2 ; m − n1 ; n1 ; n2 ). Suppose X1 and X2 are as in Theorem 3. Then
2
203
6 1 − b; and r = min(n −
lim Pr(X1 ¡ n1 − r) + Pr(X2 ¡ n2 − r) + Pr(X1 ¿ n1 + r) + Pr(X2 ¿ n2 + r) = 0
m→∞
and lim
m→∞
r
|Pr(X1 = n1 + j) − Pr(X2 = n2 + j)| = 0:
j=−r
Note that since a ¿ b, we may conclude that n2 ¿ n1 for suMciently large values of m. Proof. The &rst limit is a straightforward application of Chebyshev’s inequality. So let us consider lim
m→∞
r
|Pr(X1 = n1 + j) − Pr(X2 = n2 + j)|:
j=−r
Let ¿ 0 be given. By the Central Limit Theorem, for some constant c1 ¿ 0, there is a value M1 such that if m ¿ M1 , then all of the following occur: Pr(X1 ¿ n1 + c1 n1 a(1 − a)) ¡ =16; Pr(X1 ¡ n1 − c1 n1 a(1 − a)) ¡ =16; Pr(X2 ¿ n2 + c1 n2 b(1 − b)) ¡ =16; Pr(X2 ¡ n2 − c1 n2 b(1 − b)) ¡ =16: √ −x2 =2 . Let h = 1= ma(1 − a) and h = Now choose a constant c ¿ c . Let f(x) = (1= 2")e 2 1 1 2 1= nb(1 − b). Note that by de&nition of n, we get limm→∞ h1 =h2 = 1. This implies h1 f(kh1 ) =1 m→∞ h2 f(kh2 ) lim
uniformly over all k ∈ K(m) where K(m) = {k: max(|kh1 |; |kh2 |) 6 c2 }. Let ak = Pr(X1 = n1 + k) and bk = Pr(X2 = n2 + k). Theorem 1 on p. 184 (Vol. 1, 3rd Edition) of Feller (1968) implies that for every ¿ 0, there is a value M2 such that if m ¿ M2 , then ak ¡ 1 + 1 − ¡ h1 f(kh1 ) and 1 − ¡
bk ¡ 1 + h2 f(kh2 )
for all k ∈ K(m). Thus for some M3 , if m ¿ M3 , then ak 1 − ¡ ¡1 + 8 bk 8
204
M. Hildebrand / Statistics & Probability Letters 56 (2002) 199–206
for all k ∈ K(m). For k ∈ K(m) and m ¿ M3 , we get |ak − bk | 6 (=8)bk . Thus |ak − bk | 6 bk 6 : 8 8 k ∈K(m)
k ∈K(m)
Now r
|Pr(X1 = n1 + j) − Pr(X2 = n2 + j)| 6
j=−r
|ak − bk | +
k ∈K(m)
(ak + bk ):
k ∈[−r; r]\K(m)
For some value M4 , if m ¿ M4 , then both |kh1 | ¿ c1 and |kh2 | ¿ c1 whenever k ∈ K(m). Thus for some value M5 , if m ¿ M5 , then r
|Pr(X1 = n1 + j) − Pr(X2 = n2 + j)| 6
j=−r
+ 4 ¡ : 8 16
4. Proof of Theorem 4 Note that Qsym =(1=2)I +(1=2)P and Psym =(1=3)I +(2=3)P where P is the probability distribution of gi% where i is uniform on {1; : : : ; k} and % is uniform on {−1; 1}. (i and % are independent.) Let m1 be a multiple of 8 lying between (1 + =2)(log2 n) ln(log2 n) and (1 + )(log2 n) ln(log2 n). (Assume n is large enough so that there is such a multiple.) Let m2 =
1 2 1 3
1 2 2 3
9 m 1 = m1 : 8
Note that m2 is an integer. Let n1 = m1 (1=2), n2 = m2 (2=3) = m1 (3=4), and r = (3=8)m1 . For a given k-tuple (g1 ; : : : ; gk ), observe from Theorem 3 that r |Pr(X1 = n1 + j) − Pr(X2 = n2 + j)| dPsym (m2 ) 6 dQsym (m1 ) + j=−r
+ Pr(X1 ¡ n1 − r) + Pr(X2 ¡ n2 − r) + Pr(X1 ¿ n1 + r) + Pr(X2 ¿ n2 + r); where X1 is a binomial random variable with parameters m1 and 1=2 and X2 is a binomial random variable with parameters m2 and 2=3. However, by Theorem 6, as n → ∞, r |Pr(X1 = n1 + j) − Pr(X2 = n2 + j)| + Pr(X1 ¡ n1 − r) + Pr(X2 ¡ n2 − r) j=−r
+ Pr(X1 ¿ n1 + r) + Pr(X2 ¿ n2 + r) → 0 uniformly over all k-tuples (g1 ; : : : ; gk ). Theorem 1 with the help of the fact that dQsym (m1 ) 6 dQsym ( (1 + =2)(log2 n) ln(log2 n)) gives us that E[dQsym (m1 )] → 0 as n → ∞. Thus E[dPsym (m2 )] → 0 as n → ∞, and Theorem 4 follows since dPsym (m) 6 dPsym (m2 ) since m2 6 m.
M. Hildebrand / Statistics & Probability Letters 56 (2002) 199–206
205
5. Further comments on using Theorem 3 This theorem can be used to explore other lazy random walks on &nite groups. For example, suppose (g1 ; : : : ; gk ) is a k-tuple of elements in a &nite group G of order n, Q(s) = Pr(s = gic ) and P(s) = Pr(s = gid ) where i is uniform on {1; : : : ; k}, c is uniform on {0; 1}, Pr(d = 0) = 1=3, Pr(d = 1) = 2=3, i and c are independent, and i and d are independent. In Hildebrand (2001), the author showed that Theorem 1 still holds if Qsym is replaced with Q. Thus, by the same argument which proved Theorem 4, Theorem 4 still holds if Psym is replaced with P. Other arguments can be made for other distributions for the exponent. In Theorem 3, one cannot replace I in the de&nition of Pa with the transition matrix corresponding to a deterministic bijection. For example, suppose we are looking at Markov chains on the integers mod p where p is an odd integer. Suppose P is the Markov chain corresponding to multiplying (mod p) by 2 and then adding (mod p) +1 or −1 with probability 1=2 each, Q is the Markov chain corresponding to the deterministic bijection of multiplying (mod p) by 2, and Qa = aQ + (1 − a)P. Chung et al. (1987) showed for some p, a multiple of (log p) log(log p) steps are needed to make the Markov chain based on Q1=3 get close to uniform while a multiple of log p steps suMces for the Markov chain based on Q1=2 . Such diJering orders cannot be obtained from Theorem 3 when looking at P1=3 and P1=2 . The reader wishing to bound Pbn − U in terms of Pam − U in the case a ¡ b may wish to use the following proposition. Proposition 7. If 0 6 a ¡ b ¡ 1; then Pbn − U 6 Pam − U + p(a; b; n; m) where p(a; b; n; m) is the probability that a binomial random variable with parameters n and (1 − b)=(1 − a) is less than m. The elementary proof and applying this proposition to other distributions for the exponent are left as exercises for the reader. Acknowledgements The author would like to thank Ben Jamison for suggesting a reference which proved to be helpful. The author would like to thank an anonymous reader of an earlier version for suggesting some simpli&cations, of which one proved helpful in simplifying the arguments and others served to warn the author of missimpli&cations readers may be tempted to do. The author also would like to thank the referee and an associate editor for a few suggestions. References Aldous, D., 1983. Random walk on &nite groups and rapidly mixing Markov chains. In: Seminaire de Probabilities XVII, Lecture Notes in Mathematics, Vol. 986. Springer, Berlin, pp. 243–297. Chung, F., Diaconis, P., Graham, R., 1987. A random walk problem arising in random number generation. Ann. Probab. 15, 1148–1165. Diaconis, P., SaloJ-Coste, L., 1993. Comparison techniques for reversible Markov chains. Ann. Appl. Probab. 3, 696–730.
206
M. Hildebrand / Statistics & Probability Letters 56 (2002) 199–206
ErdBos, P., RCenyi, A., 1965. Probabilistic methods in group theory. J. Analyse Math. 14, 127–138. Feller, W., 1968. An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition. Wiley, New York. Hildebrand, M., 2001. Random lazy random walks on arbitrary &nite groups. J. Theoret. Probab. 14, 1019–1034. Pak, I., 1999. Random walks on &nite groups with few random generators. Electron. J. Probab. 4, 1–11. Ross, K., Xu, D., 1993. A comparison theorem on convergence rates of random walks on groups. J. Theoret. Probab. 6, 323–343.