Lattice QCD — A challenge in large scale computing

Lattice QCD — A challenge in large scale computing

‘.A)llIpuIcr r1Iy~I(.~ ~UI11IUUIU~4UUU~ I~ ~17O I) LOT LUIUY North-Holland, Amsterdam LATI1CE QCD - A CHALLENGE IN i~i~.~jn ~ WLVHU 11 K. ...

847KB Sizes 0 Downloads 47 Views

‘.A)llIpuIcr r1Iy~I(.~ ~UI11IUUIU~4UUU~

I~ ~17O

I)

LOT

LUIUY

North-Holland, Amsterdam

LATI1CE QCD

-

A CHALLENGE IN

i~i~.~jn ~

WLVHU

11

K. SCHILLING University of Wuppertal, D-56 Wuppertal, Gausstr., Fed Rep. Germany

The computation of the hadron spectrum within the framework of lattice QCD sets a demanding goal for the ~ supercomputers in basic science. It requires both big computer capacities and clever algorithms to fight all the numerical evils that one encounters in the Eucidean space-time-world. The talk will attempt to introduce to the present state of the art of spectrum calculations by lattice simulations.

In the past forty years electronic computers have been boosted by many orders of magnitude in their performance, mainly due to the great technological process in microelectronics. This has of course changed the notion of supercomputer over the years, since a supercomputer by definition is the most powerful machine at a given time. By this definition, ENIAC was the first electronic supercomputer 40 years ago. Let me characterize its performance with a few numbers, that shed some light on the visionary power of the pioneer John von Neumann, who coined the term ‘large scale computing’ at the time: ENIAC was made up of 22000 valves, its fast memory had a capacity of 20 ten-digit-numbers, and its mass storage could hold one hundred (!) such numbers. The computing speed of the time was 250 flops. Compare this with the supercomputer to come out this year, the ETA1O: It has some 20000 gates/chip in complementary metal oxide technology (CMOS). With a heat loss of 2 W/chip, one needs only cool 100 W in total. With the miniaturisation of 2400 gates/ cm, one can integrate a complete CPU onto one platine. The peak performance is promised to reach 10 Gflops. Are we fully aware of the feedback of this development onto science? I would like to show to you in the rest of my talk, that the present generation of supercomputers and the ones to come in the near future will enable us to tackle —





reach. in tact, tiie rate or qualitative progress mat the computers presently offer to basic science is so substantial that one might be tempted to talk about a new era of science opening up. It might be useful to put the short history of electronic computing into perspective to the timescale of modem science. You all know, that the invention of the analysis has very strongly influenced the development of physics. It took about two hundred years from Leibniz and Newton to develop the analysis to its full blossom in the last century. The analysis is a very mighty tool both for the formulation and the solution of physical problems. Just think of the beautiful solution of the two body problem is celestial and quantum mechanics 17th century and 20th century physics! Yet, at the same time, we notice the limits of analytical methods in solving problems. It ends as early as with the three body problem! So you arrive soon at perturbation theory. A perturbation theory even of the three-body-problem becomes notoriously complicated in the higher orders! As soon as you leave the regime of linear phenomena, you are totally lost with your analytical tools. This is to say, that the larger part of physics is out of the reach of classical analytical methods. Well-known examples are the anharmonic oscillator or the Navier—Stokes equations in hydrodynamics. Clever engineers have invented long ago the analog experiment to circumvent the problem.

tisevier science ru~usnersIi. V. (North-Holland Physics Publishing Division)

UU1U-4~D/lS//~Ui.DU ~)



J~.

Based on their intuition about simplified hyc namical situations, they let nature itself coi the nonlinear patterns of physics by e.g. sir ing aerodynanucal flow in a wind tunnel in ~timize airplane wings. This way mankinu waa to build airplanes without fully understanduch complex phenomena as turbulences. basic science, it is in general not so easy to an analog computer for the simulation of a ~lex system. Yet, we can set up Monte Carlo iputer experiments’ on digital machines in r to study the macroscopic behaviour of such ~ms, starting form their microscopic laws of ~mics.

u~i

i~uJ~j

into a normal integ integral is still di equals the number of degrees of freedoi system, i.e. it is of the order of the lattic V. We are of course finally interested in the limit a 0 of the theory. This limit can be realized by refining the lattice in sequential steps { j }: —~

LJ

This offers a clue how to approach the ‘solution’ of quantum field theories: indeed, back in 1974 Ken Wilson proposed to make contact in the treatment of a Quantum Field Theory (QFT) to the methods used in statistical mechanics [1] by putting the field theory on a discrete space-time lattice. As you know, the evaluation of QFT implies the computation of n-point functions or correlation functions which contain all the physics in the respective theory. They can formally be written in terms of functional integrals, which in most cases again (!) cannot be solved analytically. The traditional way to tackle these integrals is to use perturbation theory. Perturbative field theory is notoriously plagued by infinities which arise due to nonconvergence of loop integrals in the infrared and/ or ultraviolet regimes. The ultraviolet problems can be blamed to the fact that in the naive field theory with point particles you pretend to know the correct physics down to all submicroscopic length scales. This is of course notjustified. The ultraviolet divergences can be avoided by introducing cut-offs. A common regularisation procedure to get rid of them works by introduction of a high-momentum cut-off P~ in loop integrals. This amounts to injecting a minimal length a = ~ Or to put it differently: you discretize the continuum field theory onto a 4 dimensional space-time lattice and thus turn it into a well defined lattice field theory. At the same

j



~

~

~

~

SflttI.WCSLflJfl

b ~

~SLS#.JLJ

LflUL

tells you how to do it. You just tune the coupling constants of the theory in each step, such that physical predictions become independent of the actual lattice spacing. Such procedure should intuitively be the correct one, as long as the physical quantity to be computed, call it mass M, implies a length which is big compared to the actual lattice spacing ~=1/M>>a.

(2)

To phrase it differently: the lattice should be fine enough to possess many lattice points within the physical correlation length. In this manner, the lattice method promises to be a useful technique to compute the previously unaccessible longrange-behaviour of the theory. We shall see in a minute, that field theory is able to predict the relationship between coupling constant and cut-off in the case of Quantumchromodynamics, the theory describing the strong interactions of quarks, that bind together into hadrons. Let me just point out at this stage the close formal analogy between the partition function of statistical mechanics and the lattice field theory expression for the lattice approximation of the functional integral eq. (1) after passing to Euclidean space-time, t i * t: —‘

Z=ffld{Fields}

tb0~~.

e~

(3)

263

ft.

Eq. (3) hints at the practical procedure to ev~ a field theory: you put the theory onto a ciently large lattice, go to the Eucidean worl compute expectation values on an ensemble to the partition function, by a Markov proon a fast supercomputer.

aantumchromodynamics on the lattice CD has been proposed in the early seventies ie candidate theory for strong interaction cs. It has the form of a local non-Abelian theory with ~luons carrvin~ the internetinn

IdL~c

IIIOiIICIILUILI

LIaIIS1CIS.

This is mainly due to the fact, that the theory has an ultraviolet fixed point. That is to say, the coupling constant g goes to zero, as the momentum-space cutoff 1/a goes to infinity. More precisely, it has been found (3), that in SU(3) gauge theory, the following connection does hold, which is frequently alluded to as asymptotic freedom:

are nopea to supen Research activiti almost exploded a Creutz [4], that the static quarx—anuquar tial in pure gauge theory without fermioni as obtained from Monte Carlo simulatio lattices appeared to rise linearly for large Wsiaiiccs (confinement property) with a slope (string tension), that shows asymptotic scaling according to ea (41

in the following, numerical lattice methods have been applied to tackle a large variety of problems in gauge theories like quark potentials; QCD thermodynamics; hadron masses; instanton effects; chiral condensates; Higgs phenomenon. I think it is fair to say that all the previous work in the field must be considered as exploratory in the sense, that systematic errors of the computations are not yet fully under control. But the results are so encouraging and motivate QCD investigators to proceed more and more into large scale computing, and even into machine building. In this talk, I will not even make the attempt to qualitatively review all these topics (the interested reader might refer to refs. [5] and [6]). My guideline will rather be the research activitiy of our lattice group in Wuppertal, so I shall be selective and exemplify matters on hadron mass computations. How to write the QCD Lagrangian [11] — -

— — — —

a(g)

1 exp(

=

AL



~2g2

+

~ln~2g2}. (4)

Contrary to the history of quantum mechanics of Heisenberg and Schrodinger, which enjoyed an immediate triumph with the correct computation of the Balmer formula, the verification of QCD has seen only slow progress so far. The main testing grounds for the validity of QCD in fact have been in experiments at high energies and momentum transfers, where perturbation theory is expected to work. Detailed experimental checks along this line are greatly hampered by the fact, that perturbation theory is done for quarks which are only indirectly observable, after their hadromzation, which is again a nonperturbative low energy phenomenon. The computation of hadron mass spectra, however, is a low energy physics problem and therefore clearly out of reach of perturbation theory; any progress to compute the analogue of the Balmer formula in hadromc physics necessarily requires new theoretical

~

~

(5)

flavor I J’A~ — =

4A_

Lu

vl

~ ~JABC

4A48 g~

covariant derivative = 8.,~+ ~igX~lAA

on a lattice?

(Notation: A, B, C color indices, f= structure constants, XA~ Gell—Mann ma g bare coupling ‘constant’) Let’s start with the action of matterless Th~basic ingredient is the interaction of th... ge field. This field is living on the links be~n the lattice points and has values in the ge group SU(3). We denote it by U( i, is), re i stands for the site and for the direction of link starting from i. The interaction is then n by a closed loop construct =

=

=

U3 fC

A,B

+~p~+~[(i +1 —

~(8 + 2ma) tJi~p~.

We introduce a “fermionic” matrix z~ implies a nearest neighbour interaction and readt as follows:

i+i.L+i~ ~

AT!(r

V~K.t(l_V’lTJ~4B(Y

AD~XyJ

called plaquette. It can be shown, that plaquette action

the

Sg:=/3~(1—~TrDj)

(7)

has the correct formal continuum limit of the pure gauge sector of interacting gluons. It is gauge invariant under the local gauge transformation U(x, p)—* V~(x)U(x, 11)V(x+1u). (8) 2 is denoted by ~, stressing The close combination 6/g to the statistical mechanics the relationship situation. In order to inject quarks as fermions into the lattice version of the theory, one has to discretize the Dirac equation. For the sake of local gauge invariance the Dirac operator in continuum theory implies covariant derivatives instead of the normal derivatives, as we remember from our electrodynamics class. On the lattice, the derivative operator is replaced by a difference (9) So you will have contributions of the form to the fermionic part of the Lagrangian. These spinorial bilinears can be rendered gauge invariant by inserting the intermediate link operator U(i, i + ~r) in between. In this manner, Wilson [1] finally arrived at his version of the fermionic

IL~

-

The capitals A, B stand for SU(3) or colour degrees of freedom, while the denote the usual Dirac matrices, ~ the hopping parameter. Hadron propagators will be composed out of quark propagators, which can be written in the form of integrals over the gauge fields and quark fields:

=

J(dU

}

e5sfd~!i d~J1~J’A(y)~B(0) e~’.

(12) The integration over the quark fields can be done explicitly with the result Z<~ 4(y) ‘PB(O)>

=f(dU}

~_Sg

det ~

~AB(Y’

0).

(13)

This means, that one has to average the inverse fermion matrix over appropriately many gauge configurations U in order to compute the quark propagator. These gauge configurations should be weighted both with the Boltzmann factor and the fermion determinant. So let’s assess the numerical problem, once you have a ‘thermalized’ background configuration from a Monte Carlo updating procedure: you just have to compute a colunm of the inverse fermion

ft.

matrix ~1,which is a sparse matrix carrying c spin and spacetime indices. To figure oi. numbers for a 16~* 28 lattice: you have t with a complex matrix with L 1.4 Meg -columns to be computed out of a nd SU(3) field” with 5.5 million d.o.f. or 22 tes information; the quark propagators necesto compose hadrons carry 11 Mbytes inforon, with storage requirements given in real * 4 racy. So this is not a problem for your local oVAX any more! To be honest, it even out~s most of the supercomputer installations able this summer. Since a word of memory about 1 US$. most present configurations ~

iuuiuia

.J11

iiiu

it

~~aiy

~LLft~

J1



— —



=





=

e_MHADRONZ.

WU5?1~ LU

operators. In this theory ti are free parameters LIIaL ~I1uuIu u~ iunuu duce the empirical rho mass/pion mass r fermion matrix has zero modes for zei masses. Therefore, it is much easier to ~ heavy quark masses, i.e. close to the limit of the static quark model. Most of the computations in the nast have therefore been done in this re2ion.

its nearby zero mode. For illustration, I would like to mention, that one needs, on a 16~* 28 lattice at fi 6.0, with ‘light’ quarks of mass about 100 MeV (corresponding to a pion of mass 600 MeV), about 450 iterations of a conventional conjugate gradient algorithm in order to reach an accuracy of iO~ in the quark propagator. This amounts to 6 * 1013 floating point operations. Therefore, in the past most of the direct numerical computations worked with quark masses in the hundred MeV region and extrapolated down to light quark masses. Much of the computational work goes into the computation of small-distance aspects, which are just lattice artifacts and no continuum physics. It is therefore very natural to concentrate on those degrees of freedom which are dominating the infrared content of the lattice physics. In this spirit our Wuppertal group has developed an approximative blockdiagonalization algorithm, that allows to reduce the quark field degrees of freedom by factors 16 [7]. This saving is achieved by blocking the lattice into hypercubes, which carry 16 effective mass modes; since only the zero mode among them is important for the long range behaviour of the theory, we are able to cut down the problem to the numerical computation within the zero mode sector. We applied this blocking procedure twice, thus reducing degrees of freedom by the factor 256. This way, we gain a factor 12 in computer time per iteration step of the conjugate gradient. —

I.1J~ iaII.i,..~.

physical requirements are obvious. In order to reliably compute, say, the proton mass, we require that the lattice is fine enough as to have many sites within the proton the spatial lattice is 2 or 3 fm in extent; the typical correlation lengths measured in lattice units are large; the time-extent of the lattice should be sufficiently large to guarantee observation of asymptotic time-behaviour. There is agreement that the regime of asymptotic scaling does not set in before $ 6.0. At this fl-value the lattice spacing has been found to be about 0.1 fm [12]. Therefore, the above quoted lattice size corresponds to a world with a volume 5 fm3, which is not overly large. Yet we are certainly faced with a large-scale computing problem! As an experimentalist in the Minkowski world, you have the pleasure of observing hadron masses as beautiful peaks in some mass distributions. As a theorist doing computer experiments in the Eucidean world, you have to do it the hard way: you typically observe hadron masses as ground states in the corresponding channels by analyzing asymptotic slopes of exponentially decreasing hadron propagators (correlation-functions): G(t, 0)

~.,1Uu1ucI

(14)

That means, that you must fight with the noise of

=

Moreover, since we squeeze the spectum matrix, we gain the benefit of an additional 2.2 in convergence of the conjugate gradie altogether we save a factor 256 on storage f~wtor25 on CPU time. This puts us in~ tion to proceed to hadron propagator compu~nson the unprecedently large lattice 24~* 48, g just the 16 Mbytes 2 pipe Cyber 205 in Isruhe. The project so far needed some five ired CPU hours. From analyzing the hadron agators over lattice distances up to 24 lattice s, we were able to extract the hadron masses I].

Fig. 1 represents the state-of-the-art 1986 on large lattice mass computations. It shows the corn-

processor DAPP f~ Brookhaven [10] worked with the whose Dirac components are distribu neighbouring lattice sites. On a lattice size, the staggered fermionic degrees of are reduced by a factor 4, but you buy a very odd interference between flavor and Lorentz symmetries on the lattice. Our measurements are per-

mgner quart masses ana on a i0 * 4ö lattice nave been done by a group in Tsukuba University in Japan on a HITACHI machine [11]. All of these calculations neglect the impact of the fermion determinant onto the equilibrium distribution of background fields the so-called quenched approximation. The reason is simple: you need at least a factor 100 more in computer time to compute the determinant during the Monte Carlo updating [12]. Therefore, a direct detailed study of the effects of dynamical fermions so far has only been attempted for the (unrealistic) SU(2) case [13]. Physically, the effect of the determinant incorporates the effects of quark—antiquark loops onto the vacuum of the theory, i.e. dynamical fermions. Therefore, the quenched approximation should be o.k. for heavy quarks. Decreasing the quark mass, dynamical quark effects should enter the game; in fact they become necessary to stabilize the quenched approximation. Indeed, we do observe at our lowest quark mass a very striking deviation of the quenched propagators from the normal Gaussian distribution (on three exceptional out of 28 configurations) [14]. This can be attributed to fluctuations within the eigenvalue spectrum of the fermionmatrix towards very low eigenvalues. Therefore, the exceptional configurations are expected to be strongly suppressed by the determinant. On the remaining, nonexceptional configurations (‘improved quenched approximation’), our —

1.6

1.2

0

Experiment 0.2

0.4 0.6 Mit /Mp

0.8

1.0

Fig. 1. Hadron mass ratios from lattice computations in quenched approximation: m(nucleon)/m(rho) vs. m(rho)/ m(pi). These mass ratios are a function of the quark mass. The plot contains results on 16~* 32 from ref. 110] — symbol ~ and l6~from ref. [9] with Kogut—Susskind fermions with $ = 6.0 — symbol ~. The Tsukuba group (ref. [111) — symbol 0 — works with a modified Wilson action on a 16~* 48 lattice. Symbol 0 refers to our preliminary data (see ref. [8]) obtained with the blocking method and Wilson fermions on a 24~* 48 lattice at $ = 6.3 on a 16 MByte Cyber 205. Configurations with very small values of the fermion determinant were excluded.

LO I

theoretical mass ratios for light quarks are i agreement with the experimental ones (see f So we feel confident, that the lattice comput will finally reach their goal, once we have ----~ghmachines and clever algorithms to Lmical fermions. hadron mass project is an illustrative example he increasing scope of large scale computing ~cts:

.1

-

.01

-

0/

re could not have done from the beginning Dut the help of Philippe deForcrand at CRAY EARCH, Inc. in Chippewa Falls, who sup-

processor machine, attaining a computational speed of 490 Mflops (at 840 Mflops peakrate).

oioc~ing;irom

a iu

.ho ia~uces oy usiiig a

vr~uu

[15]).

2. We needed substantial computer time on the Cyber 205 machines in Bochum and Karlsruhe in order to develop and apply our method. I think I should mention that thanks to the highly professional und unbureaucratic management of the Deutsche Forschungsgemeinschaft and of the Karisruhe computer center, we were able to extend our hadron mass calculations to unprecedently large lattices, small quark masses and large statistics. The good old 2-pipe Cyber 205 performed beautifully, delivering us 65 Mflops or 65% peak rate on the conjugate gradient algorithm (64 bit accuracy).

ratios are insensitive to the blocking procedure in the sense, that the ‘systematic’ errors due to blocking are within the present ‘statistical accuracy’ of the simulation data [15].

3. On the other hand, we could not trust our results, if we were not able to get a handle finally on the systematic errors of our blocking approximation: in order to test our method, we have done direct hadron mass computations on a FUJITSU VP200 machine on a 16~* 28 lattice. Collaborators in this project are 0. Haan and E. Schnepf from the Siemens Company in Munich, which moreover generously supplied us with the necessary computer time. Fig. 2 shows you some preliminary data of ours on the pion propagator, which you can very nicely over trace over 14 lattice spacings [15]. As a result of this computation, we convinced ourselves, that the hadron mass

In case you are interested to learn about our experience: compared to the Cyber 205 the Fujitsu has a beautiful vector compiler, that does all the work for you. So: the user just develops his FORTRAN 77 code on his homely VAX, sends it to the friendly front end Fujitsu computer (with his unfriendly operating system!) and quickly becomes productive on the vector machine without much change to his code. To be specific: our program is mainly a conjugate gradient algorithm, which performed 300 Mflops on the VP200 with 520 Mflops peakrate (to be compared with the 65 Mflops double precision reached with a similar program on the 2 pipe CYBER 205 in Karls-

You might wonder why we bothered to rewrite our code for another machine. Well, the reason for choosing a Fujitsu machine was twofold: 1. FUJITSU VP200 is presently the machine with the largest available memory (64 Mbytes) in Germany, and 2. We were just curious to study the performance of a non-US product against the Cyber 205.





~OO

ft.

ruhe with 100 Mflops peakrate). But sadly ei inspite of the impressingly high megaflopra VP200 is kept busy with the huge lattice: needs 6 h to compute one full quarkprop ‘----‘h the 12 spin-colour d.o.f. of the source) rmediate quark mass.

~..~dUlUUU)

dL dli

~utlook think I have given you a taste that lattice D is quite a challenge and provides motivation, )



the theorists to develon better algorithms

llC~U

a iac~oi

Lv

IllUl~

cuiiipuici

~UWCi

110111

what we have today in order to pin down the quenched approximation and to look into the questions of dynamical fermions. and a factor 100 to settle dynamical fermions. In Germany, the Deutsche Forschungsgemeinschaft has just been convinced to set up a major research focus (Forschungsschwerpunkt) to support university research on lattice gauge theory, and federal research centers like DESY and KFA JUlich are launching, together with the universities the project of a computational research center for theoretical physics, which will provide computer time to larger research projects just like CERN offers beam time at its accelerators. The center is supposed to join both computer scientists and theoretical physicists. This development in Germany follows the recent setup of an impressing number of supercomputer centers in the US. While the big computer vendors move rather cautiously into parallel computing, ICL has built some years ago a distributed array processor (4096 processing elements), that our colleagues in Edinburgh have used very successfully to do lattice physics on. Quite a few ‘lattice physicists’ feel, that one should not rely on the most sophisticated hardware, but rather build massively parallel machines (mostly single instruction, multiple data) using ‘cheap’ con~ponentsout of more or less off-the self.



. .



processor with they want to r they chose a der,i~uWiLl! ~1~LIL addition/multiplication per processor dling 16 such vector units together wi dimensional communication channel. A Caltech group (with 0. Fox) has pioneerd a concurrent computation program working witL 32.64 and 128 nodes connected like hvnercuhel

group at Southampton (with ‘I. Hey) is building a parallel machine based on the transputer, a programmable hardware switch for the processor communication. D. Wemgarten is building with a small group at the IBM Watson Research Center a massive parallel system with 576 processors of 20 Mflops each [18]. In ‘old fashioned’ technology with 200 kW heating power. A

I am sure, that all this effort will pay off and that we can report on substantial progress, say in five years time from now. Which is very soon, as seen on the historical time-scale of analysis!

Acknowledgements I would like to express may gratitude to Dr. H. Gieti from the Siemens company for his interest in and support for lattice QCD.

References [1] K.G. Wilson, Phys. Rev. D10 (1974) 2445. [2] See e.g. the textbook of C. Itzykson and J.-B. Zuber, Quantum Field Theory (McGraw-Hill, New York, 1980). [3] For the history of asymptotic freedom, see G. ‘t Hooft, Proc. of the Colloquium in Memoriam Kurt Symanzik, Hamburg, Febr. 1984 (North-Holland, Amsterdam). [4] M. Creutz, Phys. Rev. D21 (1980) 2308.

209

A. ~)Cfl

[5] For the most recent status, see e.g. Proc. of the Wo Lattice Gauge Theory - A Challenge in Large-Scab puting, Wuppertal, Nov. 1985, eds. R. Bunk, K.H. and K. Schilhing (Plenum Press, London, New 1986). or an introduction, see the lectures of G. Schierholz yen at the 27th Summer School of the Scottisch Univerties in Physics, St. Andrews, August 1984. ~ee.g. Proc. of the Workshop Advances in Lattice Gauge leory, Tallahassee, April 1985 (World Scientific, Singa-, ore, 1985). 1.H. Mutter and K. Schilhing, Nucl. Phys. B230 (FS1O) 1984) 275. Konig, K.H. Mutter and K. Schihhing, Phys. Lett. B 147 [984) 145. Konig,K.H.Mutter, K. Schihhing and J. Smit, Phys.

Kogut, 0.0. Batroi the Proc. of the Ta] [13] E. Laermann, F. Zerwas, i~, p Mm SU(2) Colour Gauge Theory with Dynamical CERN preprint TH 4394/86. [14] K.H. Mutter, invited talk held at the 1986 B workshop on lattice gauge theory, to be publis~~. proceedings, eds. H. Satz et al (Plenum Press, New York). [15] 0. Haan, E. Laermann, K.H. Mutter, K. Schilhing, E. Schnepf and R. Sommer, Wuppertal preprint in prepara-



Wuppertal preprint WU B 86/12, to be published. [9] RD. Kenway, in Proc. of the Wuppertal workshop, ibid. [10] D. Barkal, K. Moriarty and C. Rebbi, Phys. Lett. B 156 (1985) 385.

..

..-........

~t”.~’

B, CALT-68-1317. [18] J. Beetem, M. Denneau and D. Weingarten, in the IEEE Proceedings of the 12th International Symposium on Computer Architecture held in Boston, June 1985.