Random walks on graphs and Monte Carlo methods

Available online at www.sciencedirect.com ScienceDirect Mathematics and Computers in Simulation ( ) – www.elsevier.com/locate/matcom Original arti...

Download PDF

325KB Sizes 1 Downloads 85 Views

Report

PDF Reader
Full Text

Available online at www.sciencedirect.com

ScienceDirect Mathematics and Computers in Simulation (

)

– www.elsevier.com/locate/matcom

Original articles

Random walks on graphs and Monte Carlo methods Wen-Ju Cheng a , Jim Cox b , Paula Whitlock b,∗ a Computer Science, Graduate Center, CUNY, Fifth Avenue, New York, NY, United States b CIS Department, Brooklyn College/CUNY, 2900 Bedford Avenue, Brooklyn, NY, United States

Received 9 July 2014; received in revised form 18 December 2015; accepted 18 December 2015

Abstract This paper relates the study of random walks on graphs and directed graphs to random walks that arise in Monte Carlo methods applied to optimization problems. Previous results on simple graphs are surveyed and new results on the mixing times for Markov chains are derived. c 2016 International Association for Mathematics and Computers in Simulation (IMACS). Published by Elsevier B.V. All rights ⃝ reserved. Keywords: Graph theory; Random walks; Markov chain Monte Carlo

1. Introduction In this paper we discuss the relationship among random walks arising from several different applications areas. Principally, we relate the study of random walks on graphs and directed graphs (digraphs) to the random walks arising from certain problems that are solved using Monte Carlo methods. A primary theoretical application for the study of random walks on graphs has been to computational complexity theory. There is a rich relationship between random walks on graphs and space bounded complexity classes. The major open and solved problems in the area of log space bounded complexity classes can all be restated as problems in graph search and, in particular, randomized graph search [13,1,14,9,5,7]. Random walks can be easily limited to log space memory usage during a graph search, since they only have to remember the current vertex, which takes log n bits. In contrast, a na¨ıve deterministic search would have to remember some history of the visited vertices, using at least linear space in the size of the graph. Practical applications of random walks on graphs have been to percolation problems [2] and to electrical networks [8]. Each of the well studied log space bounded complexity classes can be transformed (in deterministic log space) into an equivalent class of graph search problems. The model used by complexity theorists is the Turing machine (readers unfamiliar with Turing machines may view these machines as simply a special kind of program running on a very primitive computer.) L corresponds to undirected graph search, RL and BPL correspond to randomized search of certain well behaved digraphs and NL corresponds to a search of general digraphs. One may ask why one does not ∗ Corresponding author.

E-mail address: [email protected] (P. Whitlock). http://dx.doi.org/10.1016/j.matcom.2015.12.006 c 2016 International Association for Mathematics and Computers in Simulation (IMACS). Published by Elsevier B.V. All rights 0378-4754/⃝ reserved.

2

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

)

–

just use pseudo-random or quasi-random bits for these probabilistic programs? Replacing the random bits by a pseudo random number generator that uses a O(log n) sized random seed would show that L = BPL. Perhaps surprisingly, no one has been able to do this. The best result so far is the so-called Nisan–Wigderson generator [13] which can fool random O(log n) space Turing machines (programs) by using a truly random seed of O(log3/2 n) bits, thus showing BPL is in deterministic O(log3/2 n) space. It has also been shown [1] that RL (one-sided error) is in deterministic O(log4/3 n) space. Much is known about random walks on undirected graphs and these results predated a discovery by Omer Reingold. Reingold [14] shows that any undirected graph can be searched deterministically using only O(log n) work space. To appreciate how surprising this result is consider that it takes log n bits just to write down a single vertex! His work established that SL = L. SL is the class of problems that can be solved by symmetric Turing machines (programs where each step is reversible) that use only O(log n) work space. Each problem in the SL complexity class can be solved by searching a corresponding undirected graph. Another way of viewing this result is that certain discrete reversible Markov chains can be simulated deterministically, with no loss of efficiency. In some sense all of the Markov chains used in computer simulations are discrete since they generally use finite precision floating point numbers. While the complexity theoretic motivation for studying random walks on undirected graphs has evaporated, the results obtained are still applicable to Markov chains and [10] gives results that relate random walks on undirected graphs with reversible Markov chains. Following Reingold’s proof that undirected graph search is in L, the research focus has shifted to random walks on directed graphs in an attempt to clarify the relation between the classes L, RL, NL, and others. In the following, we summarize some of the known results on random walks on undirected and directed graphs, and then we present some of our new results [5]. The new results relate the cover and mixing times (defined below) of the random walks to a structural property of the graphs that measure how unbalanced they are. We extend our definition of digraph imbalance [5] to Markov chains and show how imbalance affects performance. In some discrete cases, for example in simulated annealing as applied to Sudoku puzzles [12,11], random walks can potentially take exponential expected time, depending on the input size, to reach a goal state. One reason for this is because the number of possible states is extremely large compared to the input size. Another imbalance occurs if some of the state transition probabilities are exponentially smaller (in the size of the input) than the others. This is certainly not a desirable property, and most applications seek a Markov chain with probabilities that are neither too small or too large. In practice, if the acceptance probability of a move is very small, the random walk will not make sufficient progress toward convergence. Conversely if the acceptance probability of a move is too large, this implies a poor sampling of the state space by the walk. Moreover, rounding is often used to bound the precision of the probabilities’ representation. Thus we will consider Markov chains that avoid these two conditions. Of course, even in the absence of these two imbalance conditions, Markov chains may still have exponential mixing and cover times. We present results for randomized graph search and then relate these results to the random walks arising in Monte Carlo methods applied to optimization (search) problems. We adapt two measures used for digraphs to Markov chains. The first measure is the directed Cheeger constant, defined by Chung [6], which bounds the mixing time. The second measure is the number of asymmetric vertices, defined by Cheng [5], which bounds the cover time. We relate our results on the number of asymmetric vertices to the expected hitting time of a goal state. Additionally we relate the graph theoretic results to the use of a Monte Carlo method to construct an accurate population sample of a Markov chain. Both the directed Cheeger constant and the number of asymmetric vertices can be modified to obtain bounds on the mixing time for certain Markov chain Monte Carlo (MCMC) methods applied to discrete problem domains. In the former case we obtain direct bounds on the mixing time, while in the latter case, we use bounds on the cover time to obtain bounds on the mixing time. 2. Random walks on graphs In this paper we shall employ the standard notation used in the asymptotic analysis of algorithms. Definition 1. Given two functions f and g, we write f (n) = O(g(n)) if for all sufficiently large n f (n) ≤ c1 g(n), for some constant c1 , f (n) = Ω (g(n)) if for all sufficiently large n f (n) ≤ c2 g(n) for some constant c2 , and f (n) = Θ(g(n)) if f (n) = O(g(n)) and g(n) = O( f (n)).

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

)

–

3

2.1. Results for undirected graphs The relation between random walks on undirected graphs and electrical networks has been well studied. Doyle and Snell [8] stated the basic results showing the correspondence between random walks on connected undirected graphs and electrical networks. In this section we survey these results. Definition 2. A graph G = (V, E) consists of a set V of vertices and a binary relation E (called edges) on V . If the relation E is reflexive (the edges are bidirectional) G is called an undirected graph or simple graph. If E is antisymmetric (the edges are directed) G is called a directed graph or digraph. Let G be a graph with n vertices and m edges. Let d(v) denote the degree of each vertex v, that is, the number of edges incident on vertex v. dmin and dmax are the minimum and maximum degrees, respectively, where the minimum and maximum are taken over all vertices. Definition 3. Let G be a digraph with n vertices and m directed edges. Let d − (v) denote the number of directed edges of the form (u, v) (entering v), called the in-degree of v. Let d + (v) denote the number of directed edges of the form (v, u) (emanating from v), called the out-degree of v. A digraph is balanced, if for each vertex v, d + (v) = d − (v). Eulerian digraphs are connected and balanced. In this case we denote d + (v) = d − (v) by d(v). A Hamiltonian path in a graph is a path that visits each vertex once. 1. A random walk is a sequence of vertices [v0 , v1 , . . . , vk ] if (vi−1 , vi ) is a directed edge and the edge is chosen with probability 1/d + (vi−1 ). P denotes the transition matrix. 2. 1/2 lazy random walk is a walk that chooses to stay at the current vertex, i.e. rejects the move, with probability 1/2. 3. H it (u, v) is the expected number of steps taken by a random walk on a graph G starting from vertex u and first reaching vertex v. 4. The hitting time of graph G, H it (G), is the maximum of H it (u, v) over all pairs of vertices of G. 5. Cover (G) is the maximum, over all vertices v of graph G, of the expected number of steps taken by a random walk starting at v to visit all vertices. 6. Commute(u, v) is the expected number of steps taken by a random walk starting from vertex u to visit vertex v and then return to vertex u. Definition 4. When a random walk has a unique stationary distribution π , one can measure the rate of convergence to this distribution in a given norm (typically the ℓ2 norm) to within some error bound ϵ. The mixing time of a random ′ ′ walk is defined as τ (ϵ) = min{t : ∥ρ t , π∥ < ϵ, ∀t ′ ≥ t}, where ρ t is the distribution at time t ′ . Definition 5. Let poly(x, y) denote some bivariate polynomial in x and y. A random walk is rapidly mixing if τ (ϵ) is poly(n, log(1/ϵ)), where n is the input bit size. We can transform a graph G into an electrical network N (G) by replacing each edge of G by a 1 ohm resistor. The effective resistance R(u, v) between two vertices u and v is the voltage induced between u and v by sending one ampere (unit) of current from u to v. The voltage at u with respect to v, f (u, v), is measured by injecting d(x) units (amperes) of current into each vertex x of G and removing 2m units from v. Chandra et al. [3,4] showed that the length of a random walk on a graph G is related to the resistance and voltage in the corresponding electrical network N (G). Let R(G) be the maximum effective resistance over all pairs of vertices. From Chandra et al. [3,4], we have the following theorems. Theorem 2.1. Let G be a connected undirected graph with m edges, and let u and v be any two vertices of G. Then H it (u, v) = f (u, v) and Commute(u, v) = H it (u, v) + H it (v, u) = 2m R(u, v). Theorem 2.2. For a given connected undirected graph G with n vertices and m edges, m R(G) ≤ Cover (G) ≤ O(m R(G) log n).

4

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

)

–

Definition 6. Given a connected undirected graph G with n vertices, let D be the diagonal matrix, and let A be the adjacency matrix of G, and define the Laplacian matrix L = D − A. Define σ (G) to be the second smallest eigenvalue of L, also called the spectral gap. 1 − λ(G) gives the expansion of the graph, where λ(G) is the second largest eigenvalue of A. Theorem 2.3. If G is an undirected connected graph on n vertices, with minimum degree dmin and maximum degree dmax , then 1/nσ (G) ≤ R(G) ≤ 2/σ (G) and 1 − λ(G)dmin ≤ σ (G) ≤ 1 − λ(G)dmax . Cover time is affected by the structure of graphs. There are some well-known bounds of cover time on undirected graphs with n vertices: n 2 for line graphs, n log n for expander graphs, and n 3 for lollipop graphs [9]. For directed graphs, the cover time can be exponentially large. An analysis of the convergence and cover times for random walks on digraphs is really an analysis of the “flow” of probability through the network. In this sense the electrical analogy can be applied. It is well known from the theory of network flows that the “Min-cut”, the smallest bottleneck in an undirected graph, gives a bound on flow through the network. The conductance of a cut is the number of edges crossing the cut divided by the smaller of the two sets of vertices comprising the cut. The graph conductance, h(G), is the minimum conductance over all cuts. It is also called the “Cheeger constant” of the graph [10]. The study of the properties of undirected graphs in relation to the eigenvalues associated with the Laplacian matrix L seems to yield rich results. It can be applied directly to the behavior of random walks, yielding bounds on mixing and cover time, and relating the Laplacian eigenvalues to the conductance. These results have been applied to reversible Markov chains [10]. 2.2. Previous results for directed graphs The situation changes in directed graphs since cuts have directed edges crossing them and the flow of probability is dependent on these directions. However the notion of conductance can be recovered in the directed case, and relates directly to the mixing time [6,10]. Since the adjacency matrices for digraphs are in general asymmetric and the corresponding Markov chains irreversible, the graph Laplacian as defined for an undirected graph is no longer meaningful. However, Chung [6] defines a digraph Laplacian matrix and the circulation on a directed graph. Definition 7. The Laplacian of a digraph G is defined by D 1/2 P D −1/2 + D −1/2 P ∗ D 1/2 2 where P ∗ denotes the conjugated transpose of P. L=I−

Definition 8. In a directed graph G, let F be a function mapping the edges of G to the non-negative reals, F : E(G) →  ℜ+ 0, that assigns to each directed edge (u, v) a non-negative value F(u, v). F is a circulation if at each vertex v,   F(u, v) = F(v, w). u

w

Chung then proves Lemma 2.4. For a directed graph G, the eigenvector π of the transition probability matrix P having eigenvalue 1 (π is the stationary distribution) is associated with a circulation Fπ as follows: For (u, v) ∈ E(G), Fπ (u, v) = π(u)P(u, v). Definition 9. In a directed graph G, let S denote a subset of vertices. The out-boundary of S, denoted by ∂ S, consists of all edges (u, v) with u ∈ S and v ̸∈ S.  F(∂ S) = F(u, v).

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

)

5

–

Fig. 1. Graph with only two unbalanced vertices.

Definition 10. For a strongly connected graph G with stationary distribution π , define the Cheeger constant with the circulation flow Fπ as follows: h(G) = in f S

Fπ (∂ S) min{Fπ (S), Fπ (S)}

where S ranges over all non-empty proper subset of the vertex set of G. Theorem 2.5. Let G be a directed graph with eigenvalues σi of the digraph Laplacian. Then σ (G), the second smallest eigenvalue, satisfies 2h(G) ≥ σ (G) ≥ h 2 (G)/2, where h(G) is the Cheeger constant of G. We summarize some useful theorems on the Cheeger constant from Chung’s paper [6]. Lemma 2.6. For a strongly connected Eulerian directed graph G on m edges, we have h(G) ≥ 2/m. Lemma 2.7. For some directed graph G with bounded out-degrees, the Cheeger constant of G can be exponentially small, i.e. h(G) ≤ c−n for some constant c. Theorem 2.8. Suppose that G is a strongly connected directed graph on n vertices. Then G has a lazy random walk with the rate of convergence of order 2σ (G)−1 (− log min x π(x)), where π(x) is the entry of π corresponding to x. This gives a mixing time t = 2σ (G)−1 ((− log min x π(x)) + 2c), where c is some constant and ϵ = e−c . Theorem 2.9. A strongly connected Eulerian directed graph G with m edges has a lazy random walk with the rate of convergence no more than m 2 log m. 2.3. New results for directed graphs Another measure similar to the conductance is the spectral expansion of a digraph [10]. However there has been little work done on applying these measures to weighted digraphs. These are the kind of digraphs that correspond to Markov chains. Another weakness, from the point of view of searching for a goal vertex or state, is that these measures bound the mixing time but not the cover time. To remedy this situation we introduced a new, basic measure of the imbalance of a digraph: the number of asymmetric vertices [5]. Asymmetric vertices are the vertices of a digraph, relative to a maximal balanced subgraph, that have at least one out-edge that is not in the maximal balanced subgraph. The number of asymmetric vertices is invariant over all maximal balanced subgraphs. This measure yields bounds on the cover time. We motivate our definition with the following example, Fig. 1. This graph G has only two unbalanced vertices and, moreover, can be easily balanced by the addition of a single directed edge. The addition of this single edge changes the hitting time from exponential to polynomial. However notice that many of the edges are not in the maximal balanced subgraph, and thus every other vertex in the chain is asymmetric. Lemma 2.10. For G, H it (G) = Cover (G) = Θ(φ n ), where φ =

√ 1+ 5 2

(the golden ratio).

Proof. Let the vertices of G be labeled from left to right by 0 to n. H it (v0 , v1 ) = H it (v1 , v2 ) = 1 H it (vi , vi+1 ) = 1/2 + 1/2(1 + H it (vi−2 , vi−1 ) + H it (vi−1 , vi ) + H it (vi , vi+1 )), H it (vi , vi+1 ) = 2 + H it (vi−2 , vi−1 ) + H it (vi−1 , vi )

for 2 < i < n

6

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

H it (v0 , vn ) =

n−1 

)

–

H it (vi , vi+1 ).

i=0

H it (vi , vi+1 ) grows in proportion to the well known Fibonacci sequence, so that H it (v0 , vn ) grows as the sum of the first n Fibonacci numbers, yielding: H it (G) = H it (v0 , vn ) = Θ(φ n+2 ) = Θ(φ n ). If we add the single directed edge from v1 to vn−1 to G, then G becomes balanced. A balanced digraph has polynomial hitting and cover time, by a result of [6]. The following result is also presented in [6]: Lemma 2.11. If a graph G is strongly connected, a 1/2 lazy random walk converges to a unique stationary distribution π . Definition 11. A Strong Chain is a simple strongly connected directed graph which contains a directed Hamiltonian path as a subgraph [5]. Two important results proved in Ref. [5] are the following: Theorem 2.12. Let C = (VC , E C ) be a Strong Chain with constant out-degree and ℓ asymmetric vertices. If ℓ = O(log n), then Cover (C) = O( poly(n)). Theorem 2.13. There exist some strongly connected digraphs with bounded out-degree d + 1 and ℓ asymmetric vertices with cover time Ω ((n − ℓ)d ℓ ). All strongly connected digraphs with bounded out-degree d + 1 and ℓ asymmetric vertices have cover time O((n − ℓ)d ℓ ). The worst case behavior for each family: 1. 2. 3. 4.

If If If If

ℓ = O(log n), cover time is Θ(n O(log d) ); Θ( poly(n)) for constant d. ℓ = ω(log n), cover time is super-polynomial. ℓ = Θ(log2 n), cover time is Ω (n O(log d log n) ). ℓ = ω(log2 n), cover time is Ω (d poly(n) ).

These results can be applied to discrete non-reversible Markov chains. Our random walks are Markov chains in which the probability of following an edge from vertex (state) v is just 1/d + (v). Conversely, an n state Markov chain M is represented by an n vertex weighted digraph G(M), where the edge weights between vertices of G(M) are the transition probabilities between the corresponding states of M. We will convert this to an out-regular unweighted digraph G u (M) by the addition of multi-edges. This allows us to directly apply our notion of balance to M. The proofs of Theorems 2.12 and 2.13 give us a nice bound for certain families of nearly balanced Markov chains. 3. Markov chains and Markov chain Monte Carlo (MCMC) methods Let us clarify our notion of balance for a Markov chain. First observe that based on Theorem 2.13, in order to have good cover times we must have nice bounds on the degree and the number of asymmetric vertices. As discussed above, we will assume that the transition probabilities are neither very small nor very large and have a limited precision. More precisely, let us assume that the transition probabilities are rational numbers that can be written in binary fixed point notation with a fixed constant number of bits. Using this representation we can interpret each probability as an integer to determine the number of directed edges corresponding to a transition. Definition 12. A binary fractions set is {i/2c : 0 ≤ i ≤ 2c }, where 1/2c is the unit of the set. Definition 13. Let M be an n state Markov chain with transition probabilities from a binary fractions set with unit 1/2c . We construct the n vertex unweighted digraph G u (M), whose out-degree is bounded by d = 2c , as follows: 1. Let each vertex v in G correspond to a state v in M. 2. If transition probability from u to v in M is j/2c , then add j directed edges from u to v on G. We immediately get Lemma 3.1. A random walk on G u (M) is identical to the Markov process M, and thus the mixing rates and cover times are identical.

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

)

–

7

Proof. From the construction, the transition matrix of M and the random walk on G u (M) are identical. In this way we can apply our graph theoretic definitions directly to Markov chains. Definition 14. A Markov chain is strongly aperiodic if the self loop probability at each state (or the probability of rejecting a move) is at least 1/2. Thus a 1/2 lazy walk gives rise to strong aperiodicity. Definition 15. A Markov chain M is called nicely bounded if the number of states of M is bounded by a polynomial in the number of the input bits and if it has transition probabilities from a binary fractions set, for a fixed constant c. Definition 16. A state in a finite state Markov chain M is an asymmetric state iff the corresponding vertex of G u (M) is asymmetric. The Cheeger constant of M is defined to be the Cheeger constant of G u (M). λ(M) is similarly defined to be λ(G u (M)). Thus M has k asymmetric states iff G u (M) has k asymmetric vertices. We will assume that each Markov chain M is ergodic, G u (M) is strongly connected and is strongly aperiodic (e.g. 1/2 lazy random walk) to insure the existence of a unique stationary distribution. Theorem 3.2. Let M be a nicely bounded, ergodic, n state Markov chain. If M has at most O(log n) asymmetric states, then the cover time of M is polynomial in the input size and M is rapidly mixing. Proof. M is nicely bounded which implies that G u (M) has constant out-degree. Recurrence implies that G u (M) is strongly connected. Theorem 2.13 thus implies that Cover (G u (M)) is bounded by a polynomial in n. Since M is assumed nicely bounded and thus n is bounded by polynomial in the input size, then Cover (M) is also bounded by a polynomial in the input size. Since the mixing time of M is bounded by Cover (M), this implies M is rapidly mixing. The Cheeger constant of a Markov chain also provides a bound on the mixing time. Theorem 3.3. Let M be a nicely bounded, ergodic, n state Markov chain. If M has a Cheeger constant h(M) > 1/ p(n), where p(n) is some polynomial in n, then M will be rapidly mixing. Proof. From Theorem 2.5 and using the equivalence of M and G u (M), 2h(M) ≥ σ (M) ≥ h 2 (M)/2 h(M). Thus σ (M) ≤ 2/ p(n). Using Theorem 2.8, and substituting 2/ p(n) for σ (M), yields that t < 4 p(n)((− log min x π(x)) + 2c). Observe that min x π(x) ≥ 1/dmax n , so that − log min x π(x) is bounded by a polynomial in n. Thus M has a rate of convergence bounded by a polynomial in n, and thus a mixing time that is polynomial in n. Since, by assumption, n is polynomial in the input size the mixing time will also be bounded by a polynomial in the input size. Lemma A.1 of [7] provides another method of showing that a digraph is rapidly mixing. In particular if a polynomial length 1/2 lazy random walk started from any vertex returns to a given vertex s with high probability, then the digraph is rapidly mixing. Lemma 3.4. Let M be a Markov chain that is strongly aperiodic. Suppose there is a state s and a natural number ℓ such that from every state v reachable from s, a random walk of length ℓ from v visits s with probability at least 1/2. Then M has a stationary distribution π such that λπ ≤ 1 − 1/8ℓ2 . Using the equivalence of a Markov chain M and the digraph G u (M), we get: Corollary 3.5. Let M be a nicely bounded, ergodic, n state Markov chain. If there exist a polynomial p(n) and a state s so that the Markov process started from any state reaches s with probability 1/2 after at most p(n) moves, then M is rapidly mixing. Proof. Since M is assumed ergodic, we have that the stationary distribution π is unique. Substituting p(n) for ℓ in Lemma 3.4 yields λ ≤ 1 − 1/8 p(n)2 . Theorem 2.3 says that σ = O(1 − λ), Theorem 2.8 gives that the mixing time t = 2σ −1 ((− log min x π(x)) + 2c), where c is some constant and ϵ = e−c . Substituting O(1 − λ) for σ and noting again that − log min x π(x) is bounded by a polynomial in n, gives that the mixing time is t = O( p ′ (n)) for some polynomial p ′ . Finally since M is assumed nicely bounded, this is also polynomial in the input size so that M is rapidly mixing.

8

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

)

–

We also observe that Monte Carlo algorithms to find an optimum state can be very badly behaved if the underlying Markov chain is badly balanced in the sense that we have defined above. As an immediate consequence of Theorem 2.13 we have the following theorem. Theorem 3.6. Consider the family of nicely bounded, ergodic n state Markov chains with ω(log n) asymmetric states (the number of states grows asymptotically faster that log n). For each n there exist Markov chains in the family with cover times superpolynomial in n. Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing Markov chains with a desired target stationary distribution. The state of the chain after a number of steps is used as a sample of the target distribution. The number of steps will affect the quality of the sample, and thus the difference from the target distribution. A good chain will have rapid mixing behavior, but this is not always the case. The difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. This is referred to as the mixing time. A random walk is used to reach the sample state. Random walk methods are easy to implement and analyze; however, it may take a long time (e.g. a walk with loops) to reach the target state. Therefore, some algorithms use self-avoiding random walks to prevent returning to visited states. We now summarize two conditions on the Markov chain which assures rapid convergence to the stationary distribution. Theorem 3.7. Consider an MCMC process M, where the random walk represents an n state Markov chain that is nicely bounded, ergodic, and has ℓ asymmetric states. If ℓ = O(log m), then the MCMC process is rapidly mixing. If the Cheeger constant h(M) is greater than some 1/ p(n) where p(n) is some polynomial in n, then M is rapidly mixing. Proof. The random walk used for sampling satisfies the conditions of Theorem 2.13 so the cover time is polynomial in the input size and this implies that the mixing time is also bounded by a polynomial in the input size. The mixing time for the second condition is drawn directly from Theorem 3.3. 4. Conclusion In this paper we have summarized some of the known results on random walks on undirected and directed graphs. We presented some of our new results which relate cover times on digraphs to a structural property that we have identified: the number of asymmetric vertices [5]. We then relate these results to the random walks arising in Monte Carlo methods applied to optimization (search) problems. We relate our results to the expected hitting time of a goal state. We showed that for nicely bounded Markov chains the number of asymmetric vertices provides bounds on the hitting, cover, and mixing times. We showed how several other measures that have been used in the study of digraphs can be adapted to Markov chains to provide bounds on the mixing times, and we gave the conditions under which a nicely bounded Markov chain will be rapidly mixing. In future work we will try to extend these bounds to more general Markov processes by a more detailed examination of the relation between a Markov chain and its associated weighted digraph. References [1] R. Armoni, A. Ta-Shma, A. Wigderson, S. Zhou, An o(log4/3 n) space algorithm for (s, t) connectivity in undirected graphs, J. ACM 47 (2000) 294–311. [2] D.K. Arrowsmith, J.W. Essam, Percolation theory on directed graphs, J. Math. Phys. 18 (2) (1977) 235–239. [3] A.K. Chandra, P. Raghavan, W.L. Ruzzo, R. Smolensky, The electrical resistance of a graph captures its commute and cover times, in: Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing, STOC ’89, ACM, New York, NY, USA, 1989, pp. 574–586. [4] A.K. Chandra, P. Raghavan, W.L. Ruzzo, R. Smolensky, P. Tiwari, The electrical resistance of a graph captures its commute and cover times, Comput. Complexity 6 (1996) 312–340. [5] W.-J. Cheng, J. Cox, S. Zachos, Random walks on some basic classes of digraphs, in: Proceedings of the 10th International Colloquium on Theoretical Aspects of Computing, ICTAC 2013, in: Lect. Notes in Comput. Sci., vol. 8049, 2014. [6] F. Chung, Laplacians and the cheeger inequality for directed graphs, Ann. Comb. 9 (1) (2005) 1–19. [7] K.-M. Chung, O. Reingold, S. Vadhan, S-t connectivity on digraphs with a known stationary distribution, ACM Trans. Algorithms 7 (2011) 30:1–30:21. [8] P. Doyle, J. Snell, Random walks and electrical networks, Carus Math. Monogr. 22 (1984) 1–159.

W.-J. Cheng et al. / Mathematics and Computers in Simulation (

)

–

9

[9] U. Feige, A tight upper bound on the cover time for random walks on graphs, Random Structures Algorithms 6 (1) (1995) 51–54. [10] J. Fill, Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process, Ann. Appl. Probab. 1 (1) (1991) 62–87. [11] T.A. Lambert, P.A. Whitlock, Generalizing sudoku to three dimensions, Monte Carlo Methods Appl. 16 (3–4) (2010) 251–263. [12] R. Lewis, Metaheuristics can solve sudoku puzzles, J. Heuristics 13 (4) (2007) 387–401. [13] N. Nisan, A. Wigderson, Hardness vs randomness, J. Comput. System Sci. 49 (2) (1994) 149–167. [14] O. Reingold, Undirected st-connectivity in log-space, in: H.N. Gabow, R. Fagin (Eds.), Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22–24, 2005, STOC 2005, ACM, New York, NY, USA, 2005, pp. 376–385.

Random walks on graphs and Monte Carlo methods

Random walks on graphs and Monte Carlo methods

Recommend Documents