Computer Physics Communications 25 (1982) 57—62 North-Holland Publishing Company
57
VECTORIZING THE MONTE CARLO ALGORITHM FOR LATFICE GAUGE THEORY CALCULATIONS ON THE CDC CYBER 205 D. BARKAI Control Data Limited, 179—199 Shaftesbury Avenue, London, WC2H 8AR, UK
and KJ.M. MORIARTY Department of Mathematics, Royal Holloway College, Egham, Surrey TW2O OEX, UK Received 6 August 1981
PROGRAM SUMMARY Title of program: LATTICE Catalogue number: AARH
non-perturbative effects, phase transitions, statistical mechanics, action per plaquette, Monte Carlo techniques, vector processors
Listing available from: CPC Program Library, Queen’s Uni-
Nature of the physical problem
versity of Belfast, N. Ireland (see application form in this issue)
The program calculates the average action per plaquette for SU(4) lattice gauge theory. Gauge theories on a lattice were originally proposed by Wilson [1] and Polyakov [2] for the
Computer: CDC CYBER 205; Installation: CYBERNET Data Center, Minneapolis, MN, USA (test run was executed on the system when it was being checked out)
regulation of the divergences of quantum field theory.
Operating system: CYBER 200 O.S. Programming language used: CYBER 200 FORTRAN
High speed storage required: 65—256 K (virtual memory system) Number of bits in a word: 64 Overlay structure: none
Number of magnetic tapes required: none Other peripherals used: card reader, line printer Numberof cards in combined program and test deck: 662 Card punching code: CDC Keywords: lattice gauge theory, SU(4) gauge theory, SU(N) gauge theory, Yang—Mifis theory, non-Abelian gauge theory,
00 lO-4655/82/0000---0000/$02.75 ~ 1982 North-Holland
Method of solution A Monte Carlo simulation of the system set up on a lattice (with varying numbers of sites per dimension considered) under an SU(4) gauge field, generates a series of field configurations. The method of Metropolis et al. [3] is used to achieve statistical equilibrium. Once in a state of statistical equilibrium any order parameter for the system can be measured. The vector capability of the CDC CYBER 205 is exploited to significantly reduce the time over that used on a scalar machine. Restrictions on the complexity of the program The storage required is dependent on the number n of dimensions and the number 1 of lattice sites along each dimension. The arrays in the program that usually have the largest dimensions are dimensioned to nln (i.e. the number of links involved in the lattice). The execution time increases with the number of links and the number of Monte Carlo iterations. The code will only execute on CDC CYBER 200 computers due to usage of extensions of FORTRAN which. amount to syntax for vector operations. Typical running time The test run with a 4 dimensional lattice of 3 sites per dimen-
D. Barkai K.J.M. Moriarty / Lattice gauge theory calculations
58
sion and 450 Monte Carlo iterations took 119 s for execution on the CDC CYBER 205.
[2] A.M. Polyakov, unpublished. [3] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbiuth, A.H. Teller and E. Teller, J. Chem. Phys. 21(1953)1087.
References [1] K.G. Wilson, Phys. Rev. D10 (1974) 2455.
LONG WRITE-UP 1. Introduction The program calculates the average action per plaquette for an SU(4) gauge group on a regular lattice of 4 dimensions. Lattice gauge theories are used to study invariant quantum field theories free of divergences and hence to understand confinement. Using the techniques of statistical mechanics, gauge theories on a lattice can be studied using Monte Carlo simulation. Up to now, all implementations of Monte Carlo simulation of lattice gauge theory have been on scalar machines. Because of the random nature of the problem and the dependence on neighbouring links, which is more complicated that the checker-board pattern, it was long felt that these techniques would not vectorize efficiently. However, by numbering the grid points along diagonals, it is possible to find inde. pendent links which may be processed in parallel. The present program is set up to calculate SU(4) in four space—time dimensions with three sites in each direction. It can easily be generalized to SU(N) for any number of space—time dimensions and any number of sites in a direction.
boundary conditions are used throughout our calculations. We define neighbours to sites on the edge of the lattice by having —
(mi,...,mj+1,...,mn)=(mi,...,mj,...,mn).
Thus, the lattice repeats in every direction. A plaquette is a square of nearest.neighb our sites in a lattice plane which we label (i/k!) (see fig. 1). The action for this typical plaquette p is = 1 LR T P e r~~ jk Id i~ and the total action for the system S [U] is ~..
S[U]
=
~.
~
~ S,,
where each combination (ilk!) is counted only once. The partition function is Z
=f~i~IdU,1) exp(—~3S[U]
where the measure is the normalized invariant Haar measure for SU(4). The inverse temperature is ~3= 8./g~where g0 is the bare coupling constant.
2. Outline of the theory A comprehensive review of lattice systems is contained in ref. [1]. The lattice is a regular square, cubic or hypercubic array of sites inn ~ 2 dimensions with I sites on a side. A site is labelled by integer coordinates m ‘(m1, m2, mn) where 1 ~
Ulk
k
...,
Uji
Ujk
I.
1 independent links. Periodic
The lattice contains ni’
Fig. 1. A plaquette of the lattice.
D. Barkal, K.J.M. Moriarty / Lattice gauge theory calculations
3. Monte Carlo method A comprehensive review of Monte Carlo techniques is given in ref. [21.At values of 13 away from the critical region, the lattice configuration quickly converges to statistical equilibrium with small thermal fluctuations. Near a critical value of 13, the relaxation time gets longer and the lattice configuration takes a long time to reach statistical equilibrium. For a limited number of iterations, hysteresis effects will then be apparent.
4. The program A description of the program and its use is included in comment cards. There are two subroutines: MONTE which performs Monte Carlo sweeps through all the links on the lattice and returns the average action per plaquette and RENORM which corrects for rounding errors in our SU(4) matrices, The main program LATTICE initializes the lattice to an ordered configuration or to a disordered configuration. The main program also includes assignment statements for the parameters 13, n and 1. To run the program, the user alters these assignments to suit his needs and makes corresponding adjustments in the dimensions of arrays storing lattice variables. The program calls RANF which returns a random number uniformly distributed between 0 and 1. The main program and RENORM are easily understood, so we now concentrate on the subroutine MONTE. For some time now it was felt that the implementation of SU(IV) lattice gauge theories has iterative properties which prevent it from being efficiently vectorized for a class VI (vector processor) computer system. If the calculation for a given link depends on the result for the previous link, then these two links cannot be computed in parallel and the process is iterative. That is exactly what happens when one steps through the links along any axis. However, a way of ordering the links exists which results in producing several groups of links where the links in each group are independent of each other for a given iteration; and, therefore, the values of quantities associ ated with links of a given group may constitute a vector, i.e. may be processed in parallel. The ordering scheme involves stepping through
59
the links along diagonals of the space—time grid. Consider a plane, 12 links, where! is the number of links in a direction. The first group wifi get its first “members” by stepping along the main diagonal. The next two groups will get their first “members” from the upper and lower diagonals plus one link from the end links on the skew-diagonal where the addition to the lower-diagonal group comes from the upper end of the skew-diagonal. The next groups will be assigned by advancing up and down along the remaining diagonals where additional links are added to these groups by advancing down and up from the ends of the skew-diagonal. The process continues until all links on the plane are exhausted. For example, if the number of sites along an axis is 5 there will be 5 such groups (numbered 1 to 5) as in fig. 2. As can be seen the result is a form of a “magic square” solution where each row is a right-shifted, wrap-around copy of the previous row. So far we have described the contribution to the groups of links arising from the first plane. To continue the process of building up the groups one has to notice that in the next plane, though the links are divided into similar 1 groups their numbering has to differ from that of the plane below. This is due to vertical link dependencies. We chose a cyclic permutation scheme for varying the groups numbering. Let us consider fig. 2. In the second plane the links marked with “1” will now be numbered “5”, those marked with “2” will be numbered “1”, etc. This process continues until all in links are grouped. It is then repeated for the n dimensions. So, in fact, one ends up with ni groups. The group size is i~l_l, and each group can now be processed in parallel, as a vector, since it contains links independent of each other.
1•
3.
~.
~•
2•
2
1
3
5
4
. ‘ S 4~ 2~ 1~ 3~ 5~ 5 4 2 3 ‘
L
S
~.
5,
S
4~ 2~
1•
-_________
K Fig. 2. “Magic square” solution of the grouping of links on a plane.
D. Barkai, K.J.M. Moriarty / Lattice gauge theory calculations
60
The ordering process overhead is insignificant since it needs to be done only once when the appropriate “index lists” describing the ordering are created. The techniques for sweeping through the links of the space—time lattice have been discussed at length in previous papers [3,4] and, therefore, will not be reproduced here.
________
___________________________ Initialise coordinates, etc.
J
______________________________ New
group
~
_________________
New direction —
J
—
L
— —
1
—
Find neighbouring sites for this
group ~
rate of performing floating-point operations. The use of the Monte Carlo technique for the multi-dimen-
1’
ues from various arrays at random. In addition, for sional the computations integration means involving thatany onelink hasone to pick needsupneighvalbouring links in (n — 1) directions. The locations of
action around the two plaquettes
I New in K, direction L plane
K*L, calculate
~
__________________
directions
K~>Yes
I
~No Apply —
—
Metropolis
Method
to links
~
~More directi~)2~ NO
More groups?
All the complex arithmetic required for dealing with SU(JV) has been formulated in terms of real floating-point operations. All the time-consuming operations are matrix add and multiply. For the SU(JV) gauge group a matrix of N X N is associated with each of the nl’1 links. This gives rise to further parallelism. A technique for matrix multiply, which may be called “long outer-product”, is used. Using this method the whole of the result matrix can be computed in parallel, i.e. in N steps compared to N3 steps in scalar. Remember, i’1_l links are processed in parallel, now, with this method for matrix operations, the effective vector length for most of the arithmetic isN2I’~’. So far we have explained how to partition the grid of links into groups of independent links, and, once we have got the groups bunched together, how to perform the arithmetic quickly. The whole scheme would still fail to be efficient if one cannot “GATHER” elements from, and “SCATTER” elements to the result arrays at a rate comparable to the
Yes
No Calculate average Write out the resul
Fig. 3. Schematic flowchart of one iteration by the program. The dashed box indicates the portion of the program which has been vectorized. On each pass through this box, i~links are being processed in parallel.
these values, when considered together with the new ordering of the links, are not consecutive. Thus, before they can be “streamed” (used in vector computations) they also need to be “GATHER”ed. As it happens, the CDC CYBER 205 instruction repertoire caters for exactly this situation very effectively. Hardware instructions exist on this system which do the required “GATHER”/”SCATTER” operations. They collect together (“GATHER”) or put in place (“SCATTER”) elements by stepping through an index list (which may be totally random) and fetching elements from or storing elements into input and output arrays, respectively. The key variables of the program are listed below: In COMMON/ VAR! B is the inverse effective temperature. ISIZE is the number of lattice sites along an axis (1). MDOWN, MUP and IPOWER are used for identifying neighbouring links. APQ is the action over one iteration through the lattice.
D. Barkai, K.J.M. Moriarty / Lattice gauge theory calculations
NGR (= NGROUP) is the value N of the gauge group SU(IV) used. In COMMON/ VAR1/
ALATR and ALATI contain all the real and imaginary link arrays, respectively. In COMMON! VAR2/ ACTAV contains the average action per plaquette over all the iterations (apart from the initial disordering of the lattice). ACT contains the average of the action over all the iterations without some number of iterations at the beginning of the run. IACTAV and IACT switches: when they = 0, do not accumulate into ACTAV, ACT, when they = l,accumulate into ACTAV, and ACT, respectively.
61
generalized to different values of the lattice size and different SU(JV) gauge groups. In fact, for larger ISIZE and/or NGROUP values even more impressive performance factors be observed when compared to a traditional serialwill system such as the CDC 7600.
Acknowledgement We wish to thank Dr. M. Creutz for conveying to us his algorithm for scalar machines, for much correspondence and for his valued friendship.
—
The algorithm described in this paper was implemented for ISIZE = 3 and NGROUP 4 [5] with disordered initial configuration and B = 10.5 running for 450 iterations. Sample output is shown in our test run output. The CDC CYBER 205 performed this preliminary implementation 5 times faster than on the CDC 7600. This implementation can be easily
References [1] J.B. Kogut, Rev. Mod. Phys. 51(1979) 659. [2] Monte Carlo methods, ed. K. Binder (Springer, Berlin, 1979). [31 R.C. Edgar, L. McCrossen and KJ.M. Moriarty, Comput. Phys. Commun. 22 (1981) 433. [4] R.W.B. Ardifi and KJ.M. Moriarty, Comput. Phys. Commun. 24(1981) 989. [51K.J.M. Moriarty, A phase transition in SU(4) four-dimensional lattice gauge theory, Phys. Lett. 106B (1981).
D. Barka4 K.J.M Moriarty / Lattice gauge theory calculations
62
TEST RUN OUTPUT ‘lIF—CAPL
ST”IJLATI~1’4 iF
******i$******$**
LATTICE IS
S1J(4) LATTICE
****$***$$*****$*
THE0~Y
*t$***$*$***
DIMENSIONAl WITH 3 SITES INITIAL CoNFIGURATION NUMBER OF ITERATIONS 15 450 4
EACH SIDE
DISOP.DERED
TOTAL INV~°S~ T~MPEVATUaE (BElA) IS
ACTION
PF.8
ITIB. 10
8CT/PLAD
~O 113 i~’0 21’) TNG TI’) 360
410
10.500
PL80U~TTE
.~,05334
hER.
ACT/PIAQ
.514289 .460393 .446504 ,4~2475 .4364~,5 .454381.
20 70
.5727’S .494210
120 170 220 270 320
.467966 .447953 .450466 .447953 .441712
.459999 .469708
370 420
.472772 .453817
jEEP. 30 80 130 180 230
ACT/PIAQ
ITER.
ACT/PLAD
ITER.
ACT/PIAQ
.566046 .471243 .464070
40 90 140
.538963 .476433 .445709
50 100 150
190 240 290 340
380 430
.445920 .432023
390 440
.466072 .433703 .426653 .454561 .456517
200 250
790 33”
.449960 .451719 .436574 .442324
.511274 .485236 .453693 .465694
350 400
.441468 .451345 .454656 .454678
.447897
450
.447197
AVE~AGE OVER ALL
450 ITERATIONS IS
.471105
AVERAGE OVER LAST
400 ITERATIONS IS
.457526
300
TOTAL CP TIME FOR PROGRAM IS
119.2870 SECONDS