QCDPAX: Present status and first physical results

QCDPAX: Present status and first physical results

Nacgear PI~,~'~csB (Frac. SuppL) 2~ (I~21) 141-144 N~qh-Hoth.~d 141 QCDPAX ~PRESEMTSTATUS AND F!RST PHYSICAL RESULTS" QCDPAX COLLABORATION Y. IWASA...

281KB Sizes 0 Downloads 21 Views

Nacgear PI~,~'~csB (Frac. SuppL) 2~ (I~21) 141-144 N~qh-Hoth.~d

141

QCDPAX ~PRESEMTSTATUS AND F!RST PHYSICAL RESULTS" QCDPAX COLLABORATION

Y. IWASAKP. K KAf~AYA~. T YOSHIE~, T. HOSHINO~'. T_ SHIP~KAWA ~. Y. OYANAGF, S. ICHI[~ and T_ KAWAU "institute of Ph~cs~ Slnst/tute of E n g ~ r i n g Mechanics ar~ "-Institute of Informat/on Sc/ences~ U n ~ y of Tsulmba~ Ib~aki 305, JAPAN ~National Laboratory for High Energy P'-mjsics.Ibaraid 305. JAPAN ~Department of Phys/cs. Keio Un~ve~,ty, Yokohama Z~3. JAPA~ The f~st successful operat~n of QCDPAX with 432 nodes is reported_ TEe peak speed ~s a h o ~ Z2S GFLOPS and the total menmries are 2_6 GByt~. It is planned to Lqcrease the number of nodes up to 480 in the near future_ After brief descr/pt/ons of the system architecture and hardgare as zweil as software d~kzpnw_nt, the results of a stLm~yof the phase traqs/t/cm ~n pel~ gaul~ tbeo¢y ~ bcie~ ~esented.

L

INTRODUCTION QCDPAXZ. 2 is a parallel eamputa~rdesignedfor simulations in lattice QCD. The QCDPAX project started about three years ago° in 1981, as a joint cog laborat/on of the physical sciencesand the compute~ sciences groups mainly at University of Tsukuba. R was funded by Japanese govemrnent with about 300 million yen which is roughly equal to 2 milEon dollars. The machine was designed by QCOPAX collaboration and manufactured by Anritsu Corporatioo. R is the fifth of PAX series.3 The status of QCDPAX as of September 1989 was reported4 at Capri. At that time 240 boards were installed and QCDPAX was in the final debugging stage. Nc~. 432 boards are installed and QCDPAX is running. We have first results from QCDPAX. The peak speed at the present time is about 12.5 GFLOPS: This is for the case of 432 boards. We are planning to increasethe number of boards up to 480. In this case the peak speed will be 14 GFLOPS_ The sustained speed for pure gauge theory is about 2.5 GFLOPS: This number is roughly obtained assuming the number of floating-point operation is about S000 for 1-1ink up-date. We use a pseudoheat bath method with 3 SU(2) multiplications and with 8 hits. "Presented by Y_ Iwasaki

The s ~ speedfor fi, n Qf.D , ~ h Wdsoo ~ u i l be 4 ~ 5 GFLOPS. in this artic~ me i e g h ~ a hr/d d e o ¢ / ~ of the a f c h ~ r e and hardlme of QCDPAX. Tlma softmue ~ t ~ be described. F~ma~ some of first ~-qc/s/cal results from QCDPAX w~l be preseated. Detailed results and aaab~es of pb~s/cal f~ salts are described in Kaaaya's ~ to tteg conference.S HARDWARE The global archRecturc d QCDPAX 5 sla~m in Fig.1. Identical processing units (PU's) are terconnected in a toroidal t~m-d/mLms~nai nc-~estneighlx~'-i'nesh. It is a homogeous, ~ a~ray of PU's_ Tbe host compote~ is the mx]kstat/on Sun2.

3/250. QCDPAX is a local memory M I M D machine. Each PU has independently each clock: Each PU works completely ~ r o n o u d y _ Therefore t~er~ is no problem for dock-skew_ Secondly. algorithr~ are more flexible in MIMD machine than SI~ID machine. For example, hyperplane incomplete LU preconditioning6 can be implemented on QCDPAX and we indeed did it_ Hcam~ver.the efficier,cy of parallefism is not good and therefore it turned out that a simpler preconditioning is more suitable for QCD-

0920-5632/91/$3.50 © Elsevier Science Publishers B.V. (North-Holland)

142

y. Iwasaki et al./QCI)PAX: Present status and first physical results

[ graphic display

H

HOS ri - PU ]nterfact (I~P])

[

]

HO S T

Sun3/260

I

J

i

P~ a r r a y Figure

1: System configuration

PAX at least for moderate light quark mass, when we judge in terms of CPU time. Thirdly, for MIMD machine, we are able to change continuous!y the lattice size by loading different programs on PU's_ In order to investigate finite size scaling we have to change the lattice size stepwise_ On the other hand, it is in general hard~ for SIMD machine, to change continuously the lattice size_ Therefore this is crucial to study the nature of a phase transition. However, this point has not been tried yet. One PU is installed on a six layer print board o f 367mm x 400mm(Fig.2). The CPU is Motorora's 68020 and the FPU (Floating-point processing unit) is L64133 by LSI Logic company_ L64133 is a singlestage (or scalar) processor. This fact simplified the software development_ The controller o f FPU is manufactured by gate-array technology. The local memory where the program and some intermediate data are stored is 4 MByte with 100 ns 1 Mbit DRAMs and the floating point data memory is 2 MByte with 35 ns CMOS SRAMs. A PU shares a two-port RAM with each of its four nearest-neighbors, which we call communication memory (CM). It also shares a twc-port RAM(ferry) with the host through the HPI. A common bus connects all PU's and the HPI. The memory of each PU is mapped on the address space of the hostcomputer. The HPI serves to select which PU's are

Figure 2: A processing unit board

Figure 3: A module

Y. Iwasaki et al./QCDPAX: Pre~ent status and firs~ physical results

143

Figure 4: Five modules in a cabinet

F~me S: QCDPAX

mapped to the address space of the host-computer. The computational results flora each PU are outputted to the h ~ t through the ~ - p o r t RAM and the common bus_ The HPI is connected to the graphic display with a v~deoprocessor1~ display the result on line. Sixteen PU's making 4 ×4 arr~-j are ~,-~-~,a~ ;n a (module) as in Fig.3. Five modules are in.allied in a cabinet (Fig_4). QCDPAX is c o m i x ~ of six

opt~ U n ~ r t u ~ " t ~ our ~ ~ not enoo~ to h~hly optimize the code_ Therefore have to tune the qfa program by hand. O~ mine ef~t m~y is to prepare a F~a~ for the fu.~a~ mental caicatadons in btt~e Q(~D. That is. we m i l e the mk~oc~le~ by qfa language ~ ~ ~ cai~tatmns~ and then use them by a f e . c t i ~ call. Using the ~ for the functame~ta! ~ in l a u ~ QCD, ~e have completed ~ d ~ progTam fore 3~bgroup meed~l~-at ~ a l g o ~ l . , for m ~ gauge theory as wefl as tl~ program ~ ~ moe~e cako algo~hm for full ( ~ ~ W~lsem~

cabinets° forming a hexagou(Fig.S).

3_ SOFTWARE DEVELOPMENT The user of the QCDPAX should prepae l~m programs: one for the host-computer, the other for the PU's. The program for the host-computer is written by the language C. TEe PU program is written by a newly developed language psc (parallel sci~ entre c). The psc progTamis compiled to an assembly language qfa (quick floating assembler) which is specially d~igued for our PU_ The user can optimize the code at the qfa level. The qfa program is assembled to the usual assemblerand then to the machine code for MC68020. The host program loads the machine code into the PU array before starting the parallel task. Some examples of psc program and qfa program were given last year4 and therefore we do not repeat them here again. In principle, we can write a lattice QCD program for the PU's using only psc, However, the floating point operation should he highly

We have a k o developed routines for .,-~_-,~g ~

calcaladoms. F ~ the d a t ~ u a e s ~ ~ me the cJ~u~ sum. For the ~ o l e c a ~ I nwl~ a s ~ e c k . that is, ~ duF%~..afionof tl~ calculafi0m, the oue w~th a s b ~ l m~aJ o ~ a ~ m ~ W~ ~a~t by doing the dupllcation for 100% a ( t ~ e cakulation and if e~rors are n ~ O,.~e~,~l for a few days. ~raduany decease the p e . : e n t a ~ of the s d f - d m ~

finally dommto 20~_ '~e~'ealso check the unwarily of SU(3) r~atrices at each updating_ 4.

PHYSICAL RESULTS We have chosen a study of the phase Uans~tion in pure gauge theory as the Fus~ product run on QCDPAX, because this study also provides a severe testfor QCDPAX by compaHn 8 the results with precise results by other groups_ We make 710~000 sweeps on a 24~ x 36 x 4 lattice at 3 = 5.69-25. The dktribution of the plaquette value is shown in

Y. Iwasaki et aL /OCDPAX: Present status and first p~vsical results

144

10000I i 8000..................

4000I'~. 6000 ...... if"~

0344 0.546 0.548 0.55 0.552 0.554 0.556 Figure 6: Histogram of the plaquette OAS

Fig.& The double structure is clearly seen. The distribution of the Polyakov loop on a complex plane is also shown in Fig./_ There are four peaks: Three of them correspond to the deconfing phase, while the one at the center corresponds to the confining phase_ Detailed analysesS lead to the results which are completely consistent with the other results ? . Our statistics is much higher than the previous ones and therefore we are able to determine precisely the gaps of the energy and the pressure across the transition_ We have also the results with .~'~ : 6. See ref.S. We have also investigated the domain structure of the vacuum around ,~ = ~ , by making simulations on a 90 × 120 x 10 x 4 lattice and also on a 60 -~x 72 × 4 lattice. What we found are the following two things_

First. although we are in general unable to see the domain structure without smearing, we are able to see it clearly by smearing the Polyakov loops on a volume like "}~ or 7"~. Secondly. when the lattice is asymmetric like 90 x 120 x 10 x 4, the domain structure with six or seven domains persists for a long period. On the other hand, when the lattice is more symmetric like 60 -~ x 72 x 4, only one or two domains remain after about 1000 sweeps_ Our schedule of the calculation with QCDPAX

is the following: We will next do the calculation of the hadron spectrum in the quenched approximation. Then we will do the full QCD calculation of the hadron spectrum or finite temperature. This project is su~,ported by the Grand-in-Aid for Specially Promoted F:esearch of Ministry of Education, Science and Culture of the Japanese Govern-

Figure 1: Histogram of the Polyakov k>op on a complex plane ment(No+62060001), h is a pleasure to acknowledge the strong support and encouragement by Professor

K. Nishijima and Professor A_ Arima. The authors are grateful to the staffs of #.nritsu Corporation for their helps in computer system development_ REFERENCES 1. Y_ lwasaki. T_ Hoshino, T_ Shirakawa. Y Oyanagi and 1-_ Kawai. Computer Physics Communications 49 (1988) 449_ 2. T. Shirakawa,

T H ~ h i n o , Y. Oyanagi. Y Iwasaki, T. Yoshie, K_ Kanaya. S_ Ichii and T. Kawai, Proceedings of Supercomputing "89. Reno. USA, Nov_ 13-17. 1989_

3_ T. Hoshino, P A X Compul~r. High-Sp~ed Par-

Mid Processing and Sci¢ntific computing (Addison-Wesley, New York.1989)_ 4. Y. Iwasaki, K. Kanaya, T. Yoshle. T_ Hoshino, T. Shirakawa, Y. Oyanagi, S lchii and T. Kawai, Nuclear Physics B (Proc_ Suppl_) l / (1990) 259. 5. K Kanaya's contribution in this volume 6_ Y. Oyanagi. Comp. Phys. Comm_ 42 (1986) 333.

1. A. Ukaw~, Nucl. Phys_ B (Proc. Suppl.) 17 (1990) 118_