Physica A 307 (2002) 52 – 62
www.elsevier.com/locate/physa
Directing a random walker with optimal potentials Marco A.P. Idiart ∗ , Marcelo Trevisan Instituto de F sica, Universidade Federal do Rio Grande do Sul, CEP 91501-970, Porto Alegre, RS, Brazil Received 27 September 2001
Abstract In a two-dimensional discrete environment we obtain numerically the potential surface that minimizes the di.usion time for a particle that is guided toward a goal point, for a given temperature. The optimal potential shape is a branched one from the con/uence of three factors that helps direct di.usion: the reduction of the dimensionality of the walk, the optimization of the potential shape in one dimension, and the minimization of the paths. We discuss the possible c 2002 Elsevier Science B.V. All rights reserved. applications of the result to robotic navigation. PACS: 02.50.−r; 05.40.−a; 05.45.Df
1. Introduction A fundamental problem in robotic navigation is how to devise an algorithm that provides the fastest, or the shortest, path to a goal position from a generic point in a known environment. An ingenious solution for this is the method of the harmonic functions [1,2]. In the method the environment is represented by a lattice—the occupancy grid—that divides it in a number of equally sized cells. A potential value (r) is attributed to each grid point and the Laplace’s equation for the lattice is solved under the constraints that cells with obstacles have a =xed value 1 and cells with goals have a =xed value 0. The result is a discrete approximation for the continuous Laplace’s problem ∇2 (r) = 0 with the boundary values (r ∈ @o ) = 1 and (r ∈ @g ) = 0, where @o and @g are the surfaces of the obstacles and goals in the environment . The solution of the navigation problem is obtained by the trajectories that follow the gradient descent on the calculated potential. The beauty of the method is twofold. First, ∗
Corresponding author. E-mail address:
[email protected] (M.A.P. Idiart).
c 2002 Elsevier Science B.V. All rights reserved. 0378-4371/02/$ - see front matter PII: S 0 3 7 8 - 4 3 7 1 ( 0 1 ) 0 0 5 7 6 - 3
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
53
there are many parallel methods to calculate the Laplace equation in a grid which makes it suitable for practical implementations. Second, and most important, the resulting potential has no local minima besides the goal points. Therefore, any dynamics that follows the gradient descent will lead to one of the targets. An example of that is the constant speed dynamics dr ∇(r) + W(t) = −v0 ; dt |∇(r) + W(t)|
(1)
where the gradient is followed with constant speed v0 , and W(t) is a noise term that accounts for inaccuracies in the measure of the gradient. A possible /aw of the method is that the solution of the Laplace’s equation in complex environments tends to be /at, far from the goal. In these circumstances, the random term in Eq. (1) dominates over the gradient term and the dynamics becomes ineGcient—a two-dimensional random walk. This could indicate that a harmonic potential shape may not be a suitable choice in a noisy environment. The question that impels us to answer is, therefore, what is the shape that a navigation potential must have in order to provide the minimal time to reach the goal in the presence of noise? Aside from its practical application, the present problem can be turned into an interesting physical problem. In a very simple model, we can think of the robot as a particle that, because of the inherent errors, di.uses on a potential surface. The level of error can be modeled as temperature noise. The potential has a single minimum and, therefore, it guides the particle towards it. In this framework we can ask, for the di.usion problem, a question that is analogous to the famous brachistochrone problem in mechanics, that is, what is the potential shape that minimizes the mean =rst passage time of the particle to reach the target point for a given temperature? We are aware that modeling a robot as a particle besides practical engineering errors on position estimation as thermal /uctuations is farfetched. Nevertheless, we believe, as it will become clear, that this simple random walk model unveils some of the essential features regarding the optimal potential for the practical problem. As for the physical problem it is interesting by itself.
2. The general theory To model the problem we consider a Markovian process in a d-dimensional square lattice, where a particle can hop to immediate neighbor sites. We assume that the transition is biased and the probability to move from a speci=c site to another depends on the strength of a bond !ab that exists between them. The transition probability to move from site b to site a is written in terms of the bonds as
!ab Wab = e e !b b (2) b
and b is the sum over the sites’ neighbor to b. An alternative formulation in terms of a potential function a that depends on site is also possible. In this case, the transition
54
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
probability has the form e− (b −b ) Wab = e− (a −b )
(3)
b
and a direct connection between bonds and potentials is established by relations involving di.erences of these quantities, for instance, !ba −!ca =−(b −c ). Therefore, the particle has a higher probability to leave a site using the neighbor that is strongly connected, or equivalently, using the one that has a lower potential. Even though the potential formulation is more natural for the issues discussed in the last section, we choose to adopt the bond formulation due to its numerical convenience. Results in one formulation can be easily translated into the other. The main reason for choosing the bond formulation is that, since the probability to remain in a site is zero, the potential formulation has the disadvantage of uncoupling the two sub-lattices formed by alternating sites. Observe that the transition probabilities from site b does not depend on the potential value in this site in Eq. (3). This makes the numerical procedure more complicated because boundary conditions in both sub-lattices are necessary. We also consider that the bond values are symmetrical, !ab =!ba . But it is important to note that the asymmetry in the transition probabilities (Wab = Wba ) still holds, because di.erent sites have di.erent neighbors. An adequate form for the numerical treatment of the problem is obtained by writing the master equation for the conditional probability to =nd the walker at site a, assuming that it starts at site a0 at time t = 0. We call it Pa (t), for a = 1; : : : ; N , and the master equation in vector notation reads @P(t) = MP(t) ; @t
(4)
where Mab = Wab − ab with the initial condition Pa0 (0) = aa0 . To consider an absorbing site n is equivalent to set Pn (t)=0 ∀t, reducing the problem to a set of (N − 1) equations (see for example Ref. [3]), such that we write @P(t) = MP(t) ; @t
(5)
where now P is a (N − 1)-dimensional vector obtained by removing the element n from the original vector P = (P1 ; P2 ; : : : ; PN ), and M is an N − 1 × N − 1 matrix obtained by removing both the row and the column n from M. Note that the solution of (5), Pa (t), represents the probability that the walker, starting from a0 , reaches a at time t without stepping on n. The mean =rst passage time for a starting point a0 and an end point n is then obtained by integrating the probability that the walker has not yet reached n by the time t, or ∞ dt Pa (t) : (6) Ta0 (!) = 0
a
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
55
The calculation of the =rst passage time reduces in order to =nd the inverse of M, since by solving (5) we see that the expression above can be written as Ta0 (!) = (M−1 )a0 a : (7) a
A good navigation map has to be able to provide good navigation performances from any starting point in the environment, so it is more interesting to minimize the average of (7) over all sites, or 1 T (!) = (M−1 )ab : (8) N −1 a; b
In summary, to determine the optimal bond con=guration for a given distribution of obstacles and goals, the following sequence of steps has to be performed: (a) =rst the matrix Mab has to be calculated. Sites with obstacles are forbidden and as a result transitions to them are not allowed. The corresponding elements Wab are set to zero, or !ab = −∞, which is equivalent to deleting the obstacle sites from the lattice. This procedure automatically guarantees re/ective boundary conditions (Neumann) for these sites. For the goal sites, the procedure is somewhat di.erent. There are transitions that lead to these sites. To turn them into absorbing sites the corresponding lines and columns are deleted from M and the matrix M is built. The main di.erence between obstacles and goals is that, in matrix M, sites that are in the vicinity of obstacles have their transition probabilities summing up to one, while this does not happen to sites in the vicinity of a goal. (b) The next step is to invert M and to sum up all its elements. The resulting expression is a function of the bonds !ab and the temperature 1= . Our objective is to choose the bonds that minimize it for a given temperature. For di.erent temperatures we expect di.erent optimal bond con=gurations. But, among them, the minimal time is expected to occur for zero temperature. Thus, a trivial way to minimize time is by increasing the values of !ab without limits. To avoid this trivial situation we assume from now on that !ab ∈ [0; 1]. 3. The one-dimensional problem With some algebraic manipulation it is possible to show that when a0 =1 and n=N +1 are the extremes of a one-dimensional chain, the average =rst passage time from one end to another is N −1 N T1 (!) = N + 2 e− (!aa+1 −!bb+1 ) (9) a=1 b=a+1
and if we average over all the possible starting points in the chain we have N −1 N 1 2 − (!aa+1 −!bb+1 ) T (!) = (N + 1) + be : 2 N
(10)
a=1 b=a+1
Here, the importance of the bound imposed over !ab is evident. Clearly, the minima of (10) occurs when the second term vanishes, or when !aa+1 = ∞, with !aa+1 − !a−1a = ∞, but this is a trivial situation where the e.ect of noise is completely erased.
56
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
1.0
ω *(β)
0.8 0.6 0.4 0.2 0.0 0.0
0.2
0.4
0.6
0.8
1.0
β
Fig. 1. Optimal potential shapes obtained by Monte Carlo simulations for N = 50. Starting from the steepest curve we have = 0:1 and the sequence = 1; 2; 3; : : : ; 10. The dotted line corresponds to = 50.
The minima of (10) can be found through a zero temperature Monte Carlo technique. We start with all the internal bond values equal to 12 , and the extremes =xed such that !12 = 0 and !NN +1 = 1. Then the internal bonds are selected randomly and set to have its value increased or decreased to a certain amount, with equal probability. The change is only accepted if it decreases T (!). Equivalently, we can adjust the bond values through the gradient descent on T (!) or Mwab = −
@T (!) ; @!ab
(11)
where is a small parameter. We do not expect local minima in one dimension; therefore, any deterministic minimization is enough to =nd the optimal bond con=guration. For large N , the sum in (10) can be approximated to an integral. We assume that the chain has total length equal to one, with the distance between two sites ‘ = 1=N . Therefore, in this limit, we can refer to a bond function w(x) with x ∈ [0; 1], and the expression for the average =rst passage time becomes T (!) 2N 2 ( ) ; where the function ( ) is de=nite as 1 1 ( ) = dx dyye− (!(y)−!(x)) 0
x
(12)
(13)
and Fig. 1 shows the bond function that minimizes ( ) for ranging from 0.1 to 50. Increasing temperatures result in a departure from the linear (harmonic) solution !(x) = x and an increase of the abruptness of the function.
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
57
0.2
0.3 0.1
0.2
0.0
0
5
10
Γ(β)
β
0.1
0.0 0
2
4
6
8
10
β Fig. 2. The time factor as a function of the noise parameter√ . The dashed line corresponds to the linear potential !(x) = x. The dotted line is for !(x) = (x − 1= 3), the optimal potential for large . The continuous line corresponds to an interpolation a piecewise√linear √ between these two potential shapes. It is √ −(1= 3)) + (1= 3) if function of the form !(x) =√0 if x ¡ (1= 3)(2=(2 + )), !(x) = ((2 + )=2)(x √ √ (1= 3)(2=(2 + )) ¡ x ¡ (1= 3)(2 + )=(2 + ), and !(x) = 1 if x ¿ (1= 3)(2 + )=(2 + ). The open squares represent the Monte Carlo optimal solution for N = 50. The linear and the piecewise linear potentials make vanish as 1=(2 ) with large , the numerical simulation seems to indicate a similar dependency.
In the limit of very large temperatures, it is possible to obtain an analytical expression for !∗ (x). Eq. (12) becomes 1 d x (3x2 − 1) !(x) (14) T (!) = N 2 2=3 − 0
∗ step and the condition time implies √ that ! (x) is a step function∗ with the √ of minimal ∗ placed√at x = 1= 3, or ! (x) = (x − 1= 3). The optimal time is T (! ) = N 2 (2=3 −
2=(3 3)) against a Th = N 2 (2=3 − 1=4) for !(x) = x. Fig. 2 displays the behavior of ( ) for four choices of the bond function. The step solution mentioned above (dotted line) has its best performance for small , but is rapidly superseded by the harmonic solution (dashed line). The numerically obtained optimal bond function (symbols), see Fig. 1, can be approximated by a piecewise linear function that interpolates between the step and the harmonic solution and gives comparable results (full line). As expected, for large ’s, the actual shape of the !(x) is of no importance. For very low temperatures all monotonous functions give the same result in one dimension. The optimization of !(x) reveals a maximal economy in time of order of 15%, as seen in the inset of Fig. 2 in the comparison between the harmonic solution and the piecewise approximation. Despite the small gain, this allows us to understand the nature of the optimal solution, that is to propitiate localized transition barriers for the particle. This is a good strategy because for high temperatures, the particle is relatively
58
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
blind to small values of the gradient, so the best is to concentrate high gradients in suitable places. 4. The two-dimensional problem For the two-dimensional case, we choose to study numerically the problem of a walk over a square lattice. The dimensions of the lattice are N by (N + 1)=2 (considering N odd), and the absorbing site is located at one of the corners. In Fig. 3, the results displayed are re/ected to look like an N × N lattice, with the target placed in the middle of one of the sides. Each con=guration corresponds to a di.erent temperature. The value of the bond, ranging from 0 to 1, is represented by the thickness of the lines. As in the one-dimensional case, the con=gurations were obtained by minimizing the =rst passage time, expression (8), by changing bond by bond in a zero temperature Monte Carlo style. But di.erent from that case, there are many local minima so we performed di.erent runs. But in all runs we obtained qualitatively by similarly con=gurations with very close values for the minimal time T (!∗ ). From the =gure we conclude that the optimal structure for the potential in two dimensions is governed by three factors, or principles. The =rst is related with the dimension dependence of a random walk. It is a known property of a di.using particle in Euclidean space such that its mean square displacement in a time “t” is independent of the dimension. More speci=cally, r 2 ∼ t. But when we ask for the time to attain a given point, the number of dimensions can be very important [4]. In our problem we precisely need to guide a particle down to the goal as fast as possible. We can increase its speed if we reduce the dimensionality of the walk. This can be accomplished by increasing the roughness of the potential in the direction perpendicular to the desired movement. This is observed in all con=gurations shown in Fig. 3. The target is localized in the center of the lower border of the environment, because of that the particle has to approach the center and move down. As a result of the optimization process, we see a system of “channels” with parallel lines approaching the center or moving down. The second e.ect is the guiding power of the slope inside a channel. This was already discussed for the one-dimensional case. The eGciency of the channel internal gradient to guide the particle towards the target varies with the noise. In fact, to attain the best performance the potential has to adapt to the noise. For increasing noise, the best strategy is to concentrate high gradients in key positions. In Fig. 3 this is apparent by the retraction and thickening of the channels. The third factor is the minimization of the path. To be eGcient, the channel system has to be designed to provide the minimal path to the target. In our simulations, the choice of a square lattice with nearest neighbors is somewhat unfortunate, because it causes a large degeneracy for the minimal path, which hides this e.ect. Fig. 4 shows how the average time to the target varies with the di.erent potential con=gurations. As before, we compare di.erent structures at di.erent noise levels. The structures present are the bond con=guration for = 0:1 (dotted line), representing
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
59
Fig. 3. Optimal potential shapes obtained by Monte Carlo simulations for a two-dimensional case for a square lattice with N = 11 × 11 sites. The absorbing site (target) is down in the center. (a) = 0:1, (b) = 1:0, (c) = 2:0, (d) = 4:0, (e) = 6:0, (f) = 8:0, (g) = 10:0 and (h) = 1000:0.
60
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62 1.0 0.4
0.8 ∆t/t
0.2 0.0
t / t max
0.6 -0.2 0
5
0.4
β
10
15
0.2
0.0 0
5
β
10
15
Fig. 4. The relative value of the mean =rst passage time in a square lattice with a two-dimensional N =11×11 lattice as a function of = 1=T . The dotted line corresponds to the optimal bond con=guration for = 0:1, the continuous line is the same for = 4 and the dot–dashed line is the same for = 100. The dashed line corresponds to a “tent” con=guration of the bonds. In this con=guration, the horizontal bonds increase linearly from zero, at the right and left boundaries, to a maximal value at the center; and the vertical bonds increase linearly from zero at the upper boundary to a maximal value at the lower boundary, subject to the bonds that connect to the target, which has value one. tmax is the average time it takes for the particle to reach the target in an unbiased walk.
the optimal structure for high temperatures; the bond con=guration for = 4:0 (full line), representing an intermediate stage; the bond con=guration for =100:0 (dot-dash line) to representing a low temperature optimal; and a “tent” con=guration of bonds representing a harmonic-like solution. In this case, we observe that for = 4:0 there is a considerable gain, of order 40%, when we choose the optimal solution for that temperature in the place of the “tent” potential. 5. Conclusion Using simulations in discrete environments we studied the optimal bond con=guration for a di.using particle that is guided toward a target point in 1-D and 2-D. For 2-D, the result can be systematized in terms of three principles. The =rst principle is the reduction of the dimensionality of the walk. This is responsible for the appearance of what we can call “rivers”, or “channels”, that are unidimensional traps that reduce the dimensionality of the walk from two to one and therefore, reduce the time taken for the particle to be absorbed. The other principle is the guiding principle that explains what is the best choice for the gradient inside a river to minimize transit time for a given noise . If we restrict the potential to be between 0 and 1, then there is not much space for large gradients, and to accomplish it we have to impose /atness in others regions. But the loss of having no gradients in these regions is compensated once the walker passes the barrier.
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
61
The last principle is related with the minimization of the river lengths, that tends to orient the rivers as much as possible towards the targets. It is worthy to mention here that the principles listed above do not compare to the principles responsible for river formation in the context of energy dissipation in real river networks [6]. In particular, from these studies it is possible to derive an energy function that depends on two competing factors: the channel length and the channel /ow. The reduction of channel length by connecting sites to close neighbors results in high /ows for some of the channels, while low /ows imply long channels connecting distant sites. The branched structure arises naturally when a compromise between the two tendencies is achieved [5]. Here, since there is no competing e.ects, what opposes a solution where there is a series of paths leading radially to the target is the discreteness of the lattice. In the continuous space, there is no limit for the roughness of a surface. The discreteness is an artifact for river models but not for navigations map. A navigation map has to be computable in real time, which makes the discreteness a necessary ingredient. For the navigation problem, the conclusion is that the branched—fractal-like— structures that appear in the context of growth models as well as in channel organization of real river networks could have practical application since they are capable of producing maps that can deal with inaccuracies in navigation. The next step to follow is, therefore, to pursue a mechanism, or a model, that can naturally generate the optimal structures showed here. Once this mechanism is found it can replace Laplace’s calculation in the practical problems. Some studies in natural navigation could shed some light on it. Blum and Abbott [7] proposed that, in rats, Hebbian learning acting on the excitatory connections in the hippocampus can cause a shift in the rat’s estimated position provided by a neural network map that exists in this region [8]. In this model, the discrepancy in the estimated position indicates the direction to follow in the same manner as that of the gradient of the potential, or the bond function, discussed here for the robot or the walker. The information is stored in the synaptic connections of the hippocampal network and can be updated while the animal behaves. A recent study indicates [9] that an optimization of this process also leads to vector =elds that have branched structures. These structures arise from the feedback given by the map on the rat’s choices of trajectories. The frequent passage by a location with the same direction of movement causes an increase in the probability that the same direction will be elected next time. This reinforcement resembles some other class of real river network models, where /ow of water sculpts the riverbed and at same time is determined by gradients in the riverbed soil [10,11]. Therefore, the reinforcement learning mechanism in a neural network could be the mechanism we are searching for. Acknowledgements This work has been partially supported by Brazilian agencies Conselho Nacional de Desenvolvimento CientRS=co e TecnolRogico (CNPq), Financiadora de Estudos e Projetos (FINEP) and FundaTca˜ o de Amparo aV Pesquisa do Rio Grande do Sul (FAPERGS).
62
M.A.P. Idiart, M. Trevisan / Physica A 307 (2002) 52 – 62
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
C.I. Connolly, R.A. Grupen, J. Robotic Sys. 10 (1993) 931–946. C.I. Connolly, Int. J. Robotics Res. 16 (1996) 497–507. L.E. Reichl, A Modern Course in Statistical Physics, 2nd Edition, Wiley, New York, 1998. V. Balakrishnan, Mat. Sci. Eng. B 32 (1995) 201–210. A. Rinaldo et al., Water Resour. Res. 28 (1992) 2183–2195. T. Sun, P. Meakin, T. Jossang, Phys. Rev. E 51 (1995) 5353–5359. K.I. Blum, L.F. Abbott, Neural Comput. 8 (1996) 85–93. E.T. Rolls, A. Treves, Neural Networks and Brain Function, Oxford University Press, Oxford, 1998. M.A.P. Idiart, M. Trevisan, Network: Computation in Neural Systems, 2000, submitted. R.L. Leheny, S.R. Nagel, Phys. Rev. Lett. 71 (1993) 1470–1473. A. Giacometti, A. Maritan, J.R. Banavar, Phys. Rev. Lett. 75 (1995) 577–580.