Tempered Fermions in the hybrid Monte Carlo algorithm

Tempered Fermions in the hybrid Monte Carlo algorithm

UCLEAR PHYSICI Nuclear Physics B (Proc. Suppl.) 60A (1998) 341-344 ELSEVIER PROCEEDINGS SUPPLEMENTS Tempered Fermions in the Hybrid Monte Carlo Al...

296KB Sizes 0 Downloads 64 Views

UCLEAR PHYSICI

Nuclear Physics B (Proc. Suppl.) 60A (1998) 341-344

ELSEVIER

PROCEEDINGS SUPPLEMENTS

Tempered Fermions in the Hybrid Monte Carlo Algorithm G. BOYD ~ Center for Computational Physics, University of Tsukuba, Tsukuba, Ibaraki 305, Japan. email:[email protected] Parallel tempering is used to simulate at many quark masses simultaneously, and to change the mass used for a given configuration, while remaining in equilibrium. The algorithm is best for small quark masses, especially if a chiral extrapolation is required.

1. I N T R O D U C T I O N The standard algorithms used for simulations of full QCD are slow, although no one knows just how slow! Physically large objects, like instantons, pions etc., may well decorrelate much more slowly than, say, the plaquette, although the evidence is not conclusive [1]. Tempered algorithms [1-3] promote a parameter of the theory to a dynamical variable that changes during the simulation. They have been successfully implemented in /3 for spin glasses, U(1) etc. For QCD at zero temperature promoting/3 does not bring any benefit (although at high temperature, around the chiral phase transition it probably will). This leaves the quark mass as the variable to promote, by allowing the mass to change after each trajectory of a standard hybrid Monte Carlo ~ algorithm [4]. The choice of a new mass is made with at Metropolis step, and insures that the change is only made if the configuration is also part of the equilibrium distribution of the new mass. The gain comes from running at heavier (faster) masses between independent configurations, and also if the auto-correlations of large physical objects are smaller for larger masses.

2. T E M P E R E D

ALGORITHMS

The quark mass, amq for staggered or n for Wilson fermions, becomes a dynamic variable, and may take a different value for each trajectory. (Regard m below as the corresponding ~ for Wilson fermions.) The masses used belong to an 0920-5632/98/$19.00 © 1998 Elsevier Science B.V. All rights reserved. PII S0920-5632(97)00495-7

ordered set with Nm elements, [mmin, ..., mmax]. The only requirement is that the action histograms of neighbouring masses overlap. The probability distribution now simulated is P(U, ¢, i), the probability of having the configuration with the gauge fields U and pseudo-fermion fields ¢ generated at quark mass mi. For each i it is the same as the original distribution P(U, ¢). There are two types of tempering, simulated and parallel tempering. In the former a single configuration is updated, and the mass may change between trajectories. The latter, a better, although more memory intensive method, updates many configurations, one for each mass. 2.1. T E M P E R I N G

The idea behind simulated tempering in QCD [3], is very simple. As it is the basis for parallel tempering, it will be introduced first. Introduce a constant gi for each mass mi, which indicates roughly where the half-way point between the action histograms of mass mi and mi+l is. The original Q C D probability function now becomes

P(U, ¢, mi) c< e x p [ - S ( U , ¢,/3, mi) + g/]. This distribution is simulated using your favourite algorithm for fixed quark mass (here HMC), combined with Metropolis steps to change from mi to mi:t:l. The hybrid Monte Carlo algorithm insures that the correct Gibbs distribution is generated at each value of the mass, and the tempering Metropolis step insures that the mass only changes if the configuration is part of the equilib-

G. Boyd/Nuclear Physics B (Proc. Suppl.) 60A (1998) 341-344

342

rium d i s t r i b u t i o n of b o t h masses. T h e c o n s t a n t s gi can be chosen freely, d e p e n d ing on w h a t seems best for t h e s i m u l a t i o n . T h e y do not affect the physics, only t h e frequency with which each m a s s is visited. T h e gi can be fixed by choosing, for e x a m p l e , to visit each m a s s with equal p r o b a b i l i t y , P(i) = 1/Nm. T h e n gi ---- - - I n Zi, ie. t h e original free energy at fixed mass mi. T h e choice is a r b i t r a r y , t h o u g h , a n d can be o p t i m i z e d for speed. T h e s i m u l a t i o n only needs gi+l - g i , a n d a g o o d s t a r t i n g p o i n t is to t a k e t h e first two t e r m s below:

Ag =

2+

3)

where (~b~} a n d (X} a r e t h e chiral c o n d e n s a t e and susceptibility. T h e r e q u i r e m e n t of overlapping h i s t o g r a m s implies t h a t ~m satisfies

& n ~ 1/

{X/7~m./v'~.

(1)

T h e o v e r h e a d d e p e n d s on the step size &n. As t h e s u s c e p t i b i l i t y is r e l a t e d to t h e scalar meson mass, X = A~,/rn~, t h e s t e p size is large in t h e chiral limit. Hence s i m u l a t e d t e m p e r i n g becomes m o r e effective for very small q u a r k masses, with the gain in speed m o r e t h a n c o m p e n s a t i n g for t h e N2m cost of h a v i n g a d d i t i o n a l masses. T h e v o l u m e d e p e n d e n c e a b o v e is in units of some relevant c o r r e l a t i o n length, so t a k i n g t h e c o n t i n u u m limit in fixed physical v o l u m e does not cause t h e m e t h o d to b r e a k down. 2.2. P A R A L L E L T E M P E R I N G If one has Nm different masses, one can happily do Arm different s i m u l a t e d t e m p e r i n g runs s i m u l t a n e o u s l y on different c o m p u t e r s . You can even go one b e t t e r , a n d p u t t h e m t o g e t h e r on one c o m p u t e r in a way t h a t removes the need for the c o n s t a n t s gi a n d i m p r o v e s t h e p e r f o r m a n c e . This is called p a r a l l e l t e m p e r i n g . Here, first g e n e r a t e one configuration Ci at each of the masses. T h e n use a M e t r o p o l i s s t e p to decide w h e t h e r t h e configuration Ci at mass m i , a n d c o n f i g u r a t i o n Ci+l at m a s s m i + l should be s w a p p e d for the next t r a j e c t o r y . If this is done, t h e next t r a j e c t o r y will run with Ci+l at m a s s m i , a n d Ci at m a s s mi+l. A f t e r each t r a j e c t o r y one s t a r t s t r y i n g to swap masses 1 a n d 2, moving

up t h r o u g h t h e list, e n d i n g by t r y i n g to s w a p t h e configurations at masses Arm - 1 a n d Arm. T h i s m e t h o d d o e s n ' t n e e d a n y c o n s t a n t s gi, changes the m a s s at two r a t h e r t h a n one configu r a t i o n , a n d has the f u r t h e r a d v a n t a g e of genera t i n g a c o n f i g u r a t i o n at each m a s s e v e r y t r a j e c tory. Also, t r a j e c t o r i e s w a y o u t in t h e t a i l of t h e d i s t r i b u t i o n s t a n d a chance of m o v i n g m o r e t h a n one step in mass. T h i s d o e s n ' t h a p p e n v e r y often though! 2.3. A C C E L E R A T I O N T h e gain from t e m p e r i n g can be e s t i m a t e d , ass u m i n g t h a t only the s m a l l e s t m a s s in t h e set would have o t h e r w i s e been s i m u l a t e d . F o r simplicity consider either simple t e m p e r i n g with only one configuration, or j u s t one of t h e configurations in a p a r a l l e l run. Let Vx be the longest relevant a u t o - c o r r e l a t i o n t i m e in units of t r a j e c t o r i e s , a n d T× t h e wall-clock t i m e per t r a j e c t o r y , w i t h x = H M C for a s t a n d a r d h y b r i d M o n t e C a r l o s i m u l a t i o n a n d x = T for a t e m p e r e d run. T h e t o t a l cost of each is t h e n Cx = ~x x Tx.

T a k i n g t h e t i m e for a single H M C t r a j e c t o r y to be THMC = rn - a the t i m e p e r t r a j e c t o r y for a parallel t e m p e r i n g run is N--1

TT = Z

+

(2)

i=0

R e p l a c i n g t h e s u m w i t h a n integral, a n d choosing oz = 2.5, mmax = 2mmin yields TT = 0.43NmTHMC. So t h e cost of a t e m p e r e d run will be lower if THMC

TT < 0 . 4 3 N ~ "

(3)

T h e final result d e p e n d s on t h e t o t a l saving at t h e heavier masses, t h e n u m b e r of m a s s e s needed, a n d the a c c e p t a n c e r a t e for s w a p p i n g masses. 3. R E S U L T S Here p r e l i m i n a r y results for p a r a l l e l t e m p e r e d fermions from a s i m u l a t i o n with four s t a g g e r e d fermions on an 8 a x 12 l a t t i c e will be discussed. T h e general conclusions a n d results a r e also applicable to the s i m p l e r t e m p e r i n g m e t h o d .

G. Boyd/Nuclear Physics B (Proc. Suppl.) 60A (1998) 341-344

343

Table 1 Parameters and results from the tempered run (T) and a standard HMC run as reference. The trajectories have unit length. The last three columns give the acceptance rate for the Metropolis mass changes, the plaquette and the 2 x 2 Wilson loop integrated autocorrelation times respectively. m (T) 0.020

(T) 0.024 (T) (T) (T) (HMC) (HMC) (HMC)

5

0.028 0.032 0.036 0.020 0.040 0.060

d~0.00550

Acc.% 82

Acci,-.i+1% 15

0.00640

350

80

21

3.2(11)

3.0(7)

0.00700 0.00770 0.00840 0.00550 0.00900 0.01300

283 231 193 475 166 84

77 78 80 78 77 73

24 21 -

4.8(24) 3.6(12) 4.8(25) 10.2(30) 8.4(23) 7.9(18)

4.0(20) 3.0(13) 3.0(6) 9.0(20) 7.1(19) 6.3(17)

I

I] I 2

1 0

5O0

nitl 1000

Figure 1. Part of the time history of the configuration number running at m -- 0.020 in the tempered run.

Two sets of runs are in progress; a tempered one with masses of {0.020, 0.024, 0.028, 0.032, 0.036}, called set 'T', and three standard HMC runs at masses 0.020, 0.040 and 0.060. The run parameters are given in table 1. So far about 2000 units in T have been run for T, and about 3500 for the HMC run. Which of the five configurations of the tern-

Ti~ P 5.0(20)

Tinl4t2×2

s/traj 475

3.5(13)

pered run used m -- 0.020 at which time can be seen in the time history of figure 1. It is clear that all five configurations have run at each mass about equally often, as required. The acceptance rate for transitions between masses is also shown in table 1, and lies around 20%. Another run with masses {0.020, 0.023, 0.026, 0.029, 0.032} yielded rates about ten percentage points higher. An acceptance rate of around 20% to 30% seems optimal. A measure of the speed of an algorithm is riot, the integrated auto-correlation [5], obtained from the auto-correlation function C ° (T) for an observable O. With insufficient data, Ndata < 1000Tint, a n accurate value cannot be obtained. The largest value for riot over the set of all observables O defines the slowest mode, and hence the number of independent measurements. For full QCD the global topological charge Q seems to be the slowest observable[l] . However, on this size lattice the topology measured using the field theoretic method turned out to depend very strongly on the action used for cooling, and is probably not well defined 1. The plaquette and Wilson loops up to 2 x 2 seem to have the most well defined auto-correlation, and will be considered below. The pion correlator should also show long-range correlations, but is very noisy, with large amplitude fluctuations, which prevent the 1The cooling actions tested used 1 × 1 and 1 x 2 loops with various values of the coefficients. On a given configuration different choices for the action lead to completely different global topological charges.

G. Boyd/Nuclear Physics B (Proc. SuppL) 60,4 (1998) 341-344

344

I

I

I

I

I

i

I

I

0.020T

I

-

0.028T

\ ii iiii ii . . . . 0.0

....... : .......

0.1

faster if all are needed for a chiral extrapolation. For realistic simulations, using improved actions on large, smooth lattices at very small quark masses, tempered methods are likely to be of benefit. This is especially true for Wilson fermions, where many n values are needed in order to extract meaningful physics.

ACKNOWLEDGEMENTS I

I

I

5

10

15

20

I

I

I

I

I

25

30

35

40

45

50

t

Figure 2. The auto-correlation function for the 2 × 2 Wilson loop at 7n = 0.020 from set T and a standard HMC reference run.

GB was partially supported by the European Union H u m a n Capital and Mobility program under HCM-Fellowship contract ERBCHBGCT940665, and is currently supported by the Japanese Society for the Promotion of Science. The author is grateful to both the Pisa and Tsukuba lattice groups for many useful discussions related to this work.

REFERENCES G. Boyd et al., Proceedings, Lattice 96, I F U P - T H 47/96, hep-lat/96008123; B. All,s, et al., Phys. Lett B 389 (1996) 107; K. Bitar et al., Phys. Rev. D 4 2 (1990) 3794. 2. E. Marinari and G. Parisi, Europhys. Lett. 19 (1992) 451; A. P. Lyubartsev et al., J. Chem. Phys. 96 (1992) 1776; L. Fern£ndez et al., J. Phys. I France 5 (1995) 1247; E. Marinari, proceedings, Bielefeld workshop on Multi-scale Phenomena, October 1996. and cond-mat/9612010. 3. G. Boyd, proceedings, Bielefeld workshop on Multi-scale Phenomena, October 1996; heplat/9701009. 4. S. Gottlieb et al., Phys. Rev. D 35 (1987) 2531; A. D. Kennedy, Nucl. Phys. B (Proc. Suppl.) 30 (1993) 96. 5. See, for example, A. D. Sokal, in Quant u m Fields on the Computer, Ed. M. Creutz, World Scientific 1992. 1.

estimation of a long auto-correlation time. In figure 2 the auto-correlation function for the 2 x 2 Wilson loop at rn = 0.020 from the tempering runs is plotted, along with one from the standard HMC run. The integrated autocorrelations obtained for both the plaquette and the 2 x 2 Wilson loop are given in table 1. These turn out to be about 20% smaller than the slope parameter needed to fit the central part of the correlator in figure 2 to an exponential. For observables from the tempered run, tint is two to three times smaller than from the standard HMC run. However, the computer time needed per tempered trajectory, TT, is larger than the time per trajectory for the HMC run at rn = 0.02; TT : .q. .,),)'T'm=O.02 . . HMC . So, if you only want results at m = 0.02, the final cost is the same. 4. C O N C L U S I O N S Tempering yields an integrated autocorrelation that is about a factor of two to three smaller than the HMC run, although much better data is needed to make this reliable! As the T run was thrice as costly, for this size lattice tempering is probably as fast as the standard HMC if only the lowest mass in the T run is required, and