Simulation of statistical mechanics models on parallel computers

Simulation of statistical mechanics models on parallel computers

Nuclear Physics B (Proc Suppl ) 5A (1988) 339-344 North-Holland, Amsterdam S I M U L A T I O N OF S T A T I S T I C A L 339 MECHANICS MODELS ON P A...

326KB Sizes 2 Downloads 81 Views

Nuclear Physics B (Proc Suppl ) 5A (1988) 339-344 North-Holland, Amsterdam

S I M U L A T I O N OF S T A T I S T I C A L

339

MECHANICS MODELS ON P A R A L L E L COMPUTERS

H e n r i k BOHR The T e c h n i c a l U n i v e r s i t y of D e n m a r k , D e p a r t m e n t of S t r u c t u r a l P r o p e r t i e s of M a t e r l a l s , B u i l d i n g 307, DK-2800 L y n g b y , D e n m a r k . T e l . No. 02-882488, T e l e x . 37529 d t h d t a d k .

Models t h a t c o v e r a wide r a n g e o f phenomena in s t a t i s t i c a l mechanics h a v e been implemented and e x e c u t e d on a n e w l y b u d t p a r a l l e l p r o c e s s o r system. The models are spin systems, l a t t i c e gas models and flow models w l t h a c h a o t i c b e h a w o u r . A c c o r d i n g to the r e s u l t s , it is a g r e a t a d v a n tage to e m p l o y p a r a l l e l p r o c e s s o r s to solve p r e d o m i n a n t l y s y n c h r o n o u s p r o b l e m s ~n s t a t l s t l c a l mechanics. I.

INTRODUCTION Parallel c o m p u t i n g has become a f a s h l o n a b l e

subject.

T h i s is because p a r a l l e l c o m p u t e r s aim

the s u b d i v i s i o n s becomes homogeneous.

Most

d y n a m i c a l models in p h y s i c s a r e of t h a t k i n d

at i n c r e a s i n g t h e c o m p u t e r p o w e r b y a f a c t o r o f

and e s p e c i a l l y l a t t i c e models.

10-100 to t h a t of t h e more c o n v e n t i o n a l v e c t o r

tage o f p a r a l l e l p r o c e s s i n g should be o b v i o u s m

processors.

these cases.

It is e x p e c t e d t h a t a p a r a l l e l p r o -

T h u s the a d v a n -

cessing set up can a c h i e v e t h i s goal if a l a r g e

In the f o l l o w i n g c h a p t e r s we shall p r e s e n t

set of m i c r o p r o c e s s o r s can share the c o m p u t a -

the a r c h i t e c t u r e of the p a r a l l e l c o m p u t e r system

tions in an e f f i c i e n t way and process them in

we c o n s t r u c t e d and t h e n g i v e some t y p i c a l e x -

parallel.

amples from s t a t i s t i c a l mechanics to which we

A l t h o u g h g e n e r a l system s o f t w a r e s u p p o r t ing p a r a l l e l c o m p u t a t i o n s is still l a c k i n g ,

o p e r a t i o n a l special p u r p o s e s o f t w a r e IS now I a v a i l a b l e f o r p a r a l l e l p r o c e s s i n g and c o v e r s q u i t e a l a r g e area of c o m p u t e r methods a p p l i e d to s c i e n t i f i c p r o b l e m s .

For such p r o b l e m s a

f a i r l y s y n c h r o n o u s and u n i f o r m d a t a - f l o w b e tween e v e n t u a l s u b - d i v i s l o n s of each p r o b l e m is desirable.

Most p r o b l e m s c o n c e r n i n g s l m u l a t i o n

of d y n a m l c a l systems a r e o f t h a t k i n d .

By d y -

namical systems we t h i n k of systems w i t h m u t u a l l y c o u p l e d elements t h a t can be d e s c r i b e d b y a set of c o u p l e d d i f f e r e n t i a l e q u a t i o n s .

They

are s o l v e d n u m e r i c a l l y on c o m p u t e r s b y , e . g . the f l m t e element m e t h o d , and m terms o f p a r a l lel p r o c e s s i n g a s u b d i v i s i o n o f t h e f i n i t e set of elements can be h a n d l e d b y each m i c r o p r o c e s s o r . Parallel p r o c e s s i n g is s t r a i g h t - f o r w a r d

a p p l i e d the c o m p u t e r .

fully

if the

c o u p l i n g b e t w e e n the elements of such a d y n a mlcal system is local and the d a t a f l o w b e t w e e n

0920 5632/88/$03 50 ~(J Elsevier Science Publishers B V (North-Holland Physics Pubhshmg Division)

2. THE A R C H I T E C T U R E PROCESSOR

OF A P A R A L L E L

We shall in t h e f o l l o w i n g d e s c r i b e the p a r a l lel p r o c e s s o r system,

PALLAS,

f o r s i m u l a t i n g o u r models.

we h a v e used

PALLAS ( P a r a l l e l

L a t t i c e S i m u l a t o r ) was b u i l t at t h e T e c h n i c a l U n i v e r s i t y of D e n m a r k d u r i n g a straightforward

1985, and it has

p a r a l l e l d e s i g n as a h y p e r -

cube in 4 d i m e n s i o n s . The p a r a l l e l c o m p u t e r system PALLAS 2 (see Fig.

I) b a s i c a l l y c o n t a i n s f o u r e l e m e n t s : The

C e n t r a l Processing U n i t CPU t h a t processes the d a t a , the f r o n t - e n d

c o m p u t e r (an IBM PC) t h a t

reads in and reads o u t d a t a , a c o l o u r m o n i t o r and a p l o t t e r . unlt,

The i n t e r e s t i n g p a r t ,

the CPU

is b u i l t up of 16 equal p r o c e s s o r c a r d s

arranged m a square lattice with nearest neighbour connections.

F u r t h e r m o r e , t h e 16 proces

H Bohr / Szmulatzon of statzstmal mechamcs models

340

sor c a r d s are connected to the IBM PC e q u i p ped with a specsal i n t e r f a c e c a r d .

Finally,

the

The communication n e t w o r k basically consists of nearest n e i g h b o u r buses and a global bus

system contains a clock g e n e r a t o r t h a t d e l i v e r s

t h a t connects the f r o n t - e n d computer to all the

s y n c h r o n o u s clock signals of 20 MHz to all the

processor cards.

processor cards.

The t o p o l o g y of the 2-dimen-

sional p r o c e s s o r n e t w o r k is a t o r u s since the b o u n d a r y p r o c e s s o r c a r d s on the h a r d w i r e d

PALLAS

lattice are connected to each o t h e r in the same row or column. IBM

The local communication is g o v -

e r n e d b y communication p o r t s (4 for each TMS 320) and the ICs contain a couple of 16-b=t latches t h a t store the incoming and o u t g o i n g data, r e s p e c t i v e l y , thus p r o v i d i n g a full d u p lex communication path between p r o c e s s o r s . The status of the p o r t s ( e m p t y / f i l l e d )

is a v a d -

able to the s o f t w a r e , b u t the p o r t s cannot i n t e r r u p t the p r o c e s s o r s . FIGURE I The PALLAS system. T h e C P U - p a r t , the t o w e r , contains 16 p r o c e s s o r c a r d s each d o i n g 5 million 16 b i t TMS320 i n s t r u c t i o n s per second.

The clock g e n e r a t o r d e l i v e r s s y n c h r o n o u s signals to all p r o c e s s o r c a r d s so the communication flow can become f u l l y s y n c h r o n i s e d . C o n c e r n i n g o t h e r parallel c o m p u t e r systems

Each p r o c e s s o r c a r d (see Fig. 2) consists of 3 a TMS 320 m i c r o p r o c e s s o r from Texas I n s t r u -

the IntelTs IPSC c o m p u t e r 4 is the f i r s t parallel

ments, communication p o r t s to the f o u r nearest

it belongs to the class of MhvtD (/Vlultiple I n -

p r o c e s s o r to be w i d e l y m a r k e t e d .

Like PALLAS

n e i g h b o u r s , a p o r t to the mare b u s , a h a r d w a r e

s t r u c t i o n M u l t i p l e Data) computer and ts based

noise g e n e r a t o r , a d i s p l a y c i r c u i t ,

on the design of a h y p e r cube n e t w o r k with

a tabel-

eprom, memory (4 K b y t e d i s t r i b u t e d on a user

128 processors w o r k i n g in p a r a l l e l .

ram and system ram) and f i n a l l y a b a t t e r y back

l a r g e s t e d i t i o n p e r f o r m s up to 10 MFLOPS

up.

(comparable to PALLAS' speed of 80 MIPS).

The memory ts hmtted b y the a d d r e s s i n g

c a p a c i t y of the p r o c e s s o r s .

The

Many s c i e n t i f i c programmes have been successf u l l y implemented on the Intel h y p e r cube, b u t one serious d r a w b a c k has been the slow e t h e r

TAgLE

C~OS

net communication and the r e l a t i v e l y slow p r o -

R^M

cessor nodes. We are in the process of c o n s t r u c t i n g a TNS320

parallel c o m p u t e r system [ F i g .

3) based on a

h y p e r cube n e t w o r k in 6 dimensions (64 nodes) and w i t h a global optical n e t w o r k c o n n e c t i n g all p r o c e s s o r nodes to a host c o m p u t e r . local n e t w o r k ts based ol duel p o r t d e s i g n .

The

a shared memory

We have d e s i g n e d each node

capable of 40 MFLOPS in peak and with a dis FIGURE 2 PALLAS p r o c e s s o r c a r d o v e r v i e w .

t r l b u t e d memory of 8 Mbyte at each node. Each p r o c e s s o r node can w o r k s e p a r a t e l y as an

H Bohr / Strnulatlon of statlsttcal mechamcs models'

a c c e l e r a t o r to a normal PC.

341

n o u s l y to all p r o c e s s o r s .

The corresponding

programmes are obviously

suited for a parallel

set-up.

Predmomnantly

t h e y u t l h z e t h e local

commumcation network and run very quickly. We chose t h e p l a n a r x - y

s p i n model f o r t h i s

case a n d will b r , e f l y p r e s e n t its i m p l e m e n t a t i o n on P A L L A S . The x-ylattice

model has the f o l l o w i n g a c -

tion in t h e Wilson f o r m .

s : ~,z{cos(fi-

~,+x ) +

C O S ( - (~i + ( ~ l _ x )

PROCESSORMODULEw TH CPU ~ND PORTS

FIGURE 3 T h e 64 node h y p e r c u b e in 6 d i m e n s i o n s . Each p r o c e s s o r node has 6 c o m m u n i c a t i o n l i n e s .

+ c o s ( ~ i - ~i+y)

+ cos(-~i ÷ ~i_y)l,

(I)

w h e r e c~i a r e t h e a n g u l a r v a r i a b l e s of each l t h J

s p i n a n d t h e , n d i c e s a r e a r r a n g e d on t h e l a t t i c e As t y p i c a l a p p l i c a t i o n s t h e l a t t l c e models can e a s , l y be p r o g r a m m e d on a m a c h i n e , lattice structure

of P A L L A S ,

with the

In t h i s case each p r o c e s s o r

r u n s a r e p h c a of t h e same p r o g r a m m e ,

and the

coordmatlon

is r e -

of t h e p r o c e s s o r ' s a c t i v i t y

d u c e d to t h e p r o b l e m of s h a r i n g t h e d a t a r e l a t e d to t h e b o u n d a r i e s of t h e p a r t i t i o n s cal p r o b l e m .

and y mdicating

simply by partition-

ing t h e p h y s i c a l l a t t i c e a n d a s s i g n i n g each p a r t to a p r o c e s s o r .

w l t h t h e i n d e x i l a b e l l m g each c r o s s a n d w i t h x

in t h e p h y s i -

We must s t r e s s t h a t r u n n i n g

the two directions.

We h a v e used t h e L a n g e v i n a p p r o a c h 5 to a n a l y s e t h i s model w h i c h means t h a t we have to solve the following first

order differentlal

e q u a t i o n system d~ dt

c~S = - -+ rli, 3(~

(2)

the

same p r o g r a m m e s on all p r o c e s s o r s does n o t

w h e r e t h e noise f u n c t i o n

automatlcally imply running

t i o n s t h a t lead to a d e s i r e d p r o b a b i l ~ t y d i s t r i -

t h e same i n s t r u c t i o n s

as m a v e c t o r p r o c e s s o r b e c a u s e t h e e x e c u t i o n can d e p e n d on t h e d a t a .

bution for the fields

rl t m v o l v e d f l u c t u a -

~i

T h e mam d i f f e r e n c e

b e t w e e n t h e e x a m p l e s w h i c h we d i s c u s s below is e x a c t l y r e l a t e d to t h i s p o i n t .

P(@i ) = e x p ( - S ( @ i ) I o ) . T h e no~se f u n c t i o n s

(3)

q~ a r e G a u s s i a n r a n d o m

v a r i a b l e s n o r m a l i z e d to t h e w l d t h 3. A S Y N C H R O N O U S P A R A L L E L L A T T I C E UP OF THE x - v MODEL

In t h i s s e c t i o n we will p r e s e n t t h e d a t a from a typical synchronous

o by

SETTli(t)~j(t')

= 2~6q6(t

t').

(4)

In o u r case t h e d i f f e r e n t i a l

e q u a t i o n s in e q .

l a t t i c e s i m u l a t i o n t h a t was

performed m parallel by PALLAS. c h r o n o u s p r o g r a m m e we mean

By a syn-

a programme ex-

(2) a r e s o l v e d n u m e r i c a l l y on P A L L A S b y s o l v -

e c u t e d Jn such a w a y t h a t all p r o c e s s o r s a r e

i n g t h e f o l l o w i n g d i f f e r e n c e e q u a t l o n system

transferring

iteratively

t h e same a m o u n t o f d a t a b e t w e e n

each o t h e r in e q u a l t~me i n t e r v a l s . chronous procedure clock generator

Such a s y n -

is e a s i l y c o n t r o l l e d b y a

t h a t sends o u t s i g n a l s s y n c h r o -

c~t new = c~t old

+

H Bohr / S t m u l a t t o n of stattsttcal mechamcs models

342

d a t a on one p a r t i c u l a r

i n p u t p o r t of a c e r t a i n

processor. + Z ~ - I s i n ( ~ i o l d - ~ i + j o l d ) At +rl i.

(5)

J

T h e i n d e x j in t h i s case r e p r e s e n t s

the nearest

n e . g h b o u r s p i n s to t h e i th s p i n (4 to each isite).

In o u r case each p r o c e s s o r t o o k c a r e o f

an a r r a y o f 7 x 7 s p i n s so a l t o g e t h e r t h e l a t -

We managed t h e s e p r o b l e m s b y a h a r d w a r e h a n d s h a k e on each p o r t t h a t c o u l d d e l a y new d a t a in a " w o u l d b e " b o t t l e n e c k a n d t o g e t h e r with software precautions could avoid any problems of o v e r f l o w . We chose a m o l e c u l a r l a t t i c e gas model 7 to

tice size was 16 x 7 x 7.

r e p r e s e n t such a s y n c h r o n o u s

programmes that

s h o u l d w o r k as t e s t p r o g r a m m e s on t h e

0.6 Cv: d 0.5 -

P A L L A S c o m p u t e r b e s i d e s t h e wish to g a i n more

d'~

i n s i g h t ~n t h e model b y h l g h - s t a t ~ s t i c s

0.4

tions.

Actually

t h i s l a t t i c e gas model was not

t h e most g e n e r a l a s y n c h r o n o u s

0.3

simula-

programme that

c o u l d be t h o u g h t of since t h e a m o u n t of data 0.2

exchanged and their order are fixed and only the arrival

0.1

time is u n d e f i n e d .

T h e model is an

Ising type theory with the following bond I

l

I

I

I

0.5

I

I

I

I

1.0

I

1.4

:1,!3

Hamiltonian H + ~N = - J

FIGURE 4 T h e s p e c i f i c heat as a f u n c t i o n x - y model.

Z

I-[ k(i)(l-Ck)CiCj

2(ij)

of [3 in t h e x n (I - c ) l(l)

In Flc5.

4 we p r e s e n t t h e c u r v e of t h e s p e -

cfflc heat as a f u n c t l o n peak a r o u n d

The curve

13 = I. 04 c o r r e s p o n d i n g

order phase transition a t u r e 6.

of 13.

to a second

well k n o w n in t h e l i t e r -

A t e s t of t h l s p r o g r a m m e ( f u l l y

i z e d ) o n C R A Y XMP g a v e r o u g h l y cution

has a

vector-

t h e same e x e -

speed.

T h e P A L L A S c o m p u t e r s y s t e m was o r i g i n a l l y designed for synchronous

programmes.

It

t h e r e f o r e c a u s e d some p r o b l e m s to r u n a s y n chronous programmes, volve a different ferred

~.e. p r o g r a m m e s t h a t i n -

set of i n f o r m a t i o n to be t r a n s -

b e t w e e e n t h e p r o c e s s o r s in a g i v e n time

t h e a v e r a g e p a r t i c l e n u m b e r ~N). Furthermore, a lattice,

k(j)

i, j a r e n e t g h b o u r m g

a r e all f i r s t

i which are different

sites of

n e i g h b o u r s of sites

from j a n d s i m i l a r l y

n e i g h b o u r s of j d i f f e r e n t

l(j)

from i.

T h e t w o p r o j e c t o r s 11(I - Ck, I) e n s u r e t h a t a b o n d c o n n e c t i n g t w o n e i g h b o u r ~ n g p a r t i c l e s wdl pay energy

- J o n l y if each o f Lhe p a r t i c l e s I n -

v o l v e d h a v e no o t h e r b o n d s . a n e x t to n e a r e s t n e i g h b o u r

T h i s f a c t implies interaction.

The

s i m u l a t i o n is c a r r i e d o u t u s i n g a s t a n d a r d M e t r o p o l i s Monte C a r l o a l g o r i t h m . contains

(see r e f .

crystal

sors.

atomic f l u i d p h a s e .

Thas means t h a t if no p r e c a u t i o n s a r e

(6)

w h e r e c s t a n d s f o r an o c c u p a t i o n n u m b e r ,

i n t e r v a l a c c o r d i n g to t h e p o s i t l o n of t h e p r o c e s -

t a k e n d a t a m i g h t be r e a d on t o p of c u r r e n t

,

i

c ~ { 0 , I } a n d a chemical p o t e n t i a l ~ c o n t r o l h n g

a r e all f i r s t

4. AN A S Y N C H R O N O U S P A R A L L E L L A T T I C E S E T - U P OF A M O L E C U L A R GAS

- ~:c

i

2,7)

4 phases

T h e model A molecular

p h a s e , a m o l e c u l a r f l u i d phase and an

H Bohr / Slmulatton of stattstlcal mechanics models

5. A FLOW MODEL WITH A C H A O T I C BEHAVlOUR

ttons d e s c r i b i n g each n e p h r o n are solved as d i f f e r e n c e equations b y the f o r w a r d Euler num-

N e x t we have chosen an i n t e r e s t m g s t a t t s t l cal mechanics model m b i o p h y s i c s .

The model 8

d e s c r i b e s a coupled system of f u n c t i o n a l u n i t s (nephrons)

m the k i d n e y .

The model ts basic-

a l l y a h y d r o d y n a m i c a l model and its solution r e q u i r e s a large amount of c o m p u t e r power b u t can easily be simulated on the parallel computer PALLAS.

343

In the model as well as in the e x p e r l -

mental data t h e r e are s t r o n g i n d i c a t i o n s of v e r y complex chaotic phenomena a r i s i n g from more

erical method w i t h a small time step. Results of the simulation 9 are shown in Fig.

5 where the t u b u l a r p r e s s u r e in one ne-

p h r o n out of 256 e x h i b i t s an o s c d l a t o r y b e h a v iour.

The f i r s t c u r v e of no c o u p l i n g to n e i g h -

b o u r s shows r e g u l a r o s c i l l a t i o n s , while the second c u r v e with a weak c o u p l m g shows ~rregular oscillations.

F~gure 6 shows the c o r r e s -

p o n d i n g phase p o r t r a , t s of the simulated p r e s sure oscillations with an a t t r a c t o r .

simple chaotic a t t r a c t o r s found in the dynamics of the single, u n d i s t u r b e d n e p h r o n .

The simu-

lations are based on a 2-dimensional g r i d of 16x16 n e p h r o n s w i t h n e a r e s t - n e i g h b o u r c o u p l i n g . A n e p h r o n is b a s i c a l l y a long t u b u l e w i t h the

G 17"51

u p p e r (closed) end c o n t a i n i n g 20-40 c a p i l l a r y

PT mm Hg

II

15o0

loops a r r a n g e d m p a r a l l e l . A simple, dynamical model of the p r e s s u r e and flow r e g u l a t i o n in a n e p h r o n is g i v e n b y 8 c o u p l e d 1'order d i f f e r e n t i a l e q u a t i o n s .

model emphasizes the f u n c t i o n of the t u b u l o g l o merular feedback loop.

/ 7.5/ 0

The

As a r g u e d in r e f .

8,9

. 1

.

. 2

TIME(ram)

. 3

4

®

the slow oscillations m t u b u l a r p r e s s u r e r e p r o duced m Fig.

5 are b e l i e v e d to arise from

temporal s e l f - o r g a n i z a t i o n m t h a t p a r t i c u l a r loop.

©

Pa = 100 mm Hg

COUPLING = 0

Pa = 100 mm Hg

COUPLING = 0.2

Due to the v e r y dense p a c k i n g of n e p h r o n s m the k i d n e y , one can e x p e c t a c e r t a i n degree of i n t e r a c t i o n between n e i g h b o u r i n g n e p h r o n s . In the slmulatton the n e p h r o n system is a r r a n g e d in a 2-dimensional lattice w i t h 16x16 nodes each r e p r e s e n t m g a single n e p h r o n coupled to the nearest n e i g h b o u r s in all d i r e c t i o n s . Each of the 16 p r o c e s s i n g units on PALLAS simulate a small 4x4 s u b - s y s t e m of the l a t t i c e . T h i s is done in parallel w i t h r e s p e c t to time and w~th the help of s y n c h r o n o u s clock signals. U p - d a t , n g of the common b o r d e r s c o n s t i t u t e s the essential mformat~on flow between the p r o cessor u n l t s .

The 8 coupled d i f f e r e n t i a l e q u a -

FIGURE 5 A. E x p e r i m e n t a l r e s u l t s showing osc,llattons m the p r o x i m a l i n t r a t u b u l a r p r e s s u r e PT inside a rat k i d n e y . B. Slmulatton r e s u l t s showing the p r e s s u r e PT in a single n e p h r o n out of 256 n e p h r o n s with no nearest n e i g h b o u r c o u p l m g . C. Similar results as m B b u t wlth a weak c o u p l i n g . Often oscillations become much more i r r e g u l a r than shown.

H Bohr / Stmulatton of stattsttcal mechamcs models

344

8. K . S . Jensen, E. Mosekilde and N.H. H o l s t e i n Rathlou, Mondes en Developpement, Tome 14, No. 53 (1986).

Q

9. K . S . Jensen, H. B o h r , N.H, H o l s t e m Rathlou, T. Petersen and B. Rathjen Proceedings of the Workshop on Statistical Physics, p. 137, ed. H. Bohr and O . G . M o u r i t s e n , DTH, L y n g b y , Denmark (1987).

® ,~

-.

Y FIGURE 6 R e c o n s t r u c t i o n of 3-dimensional phase p o r t r a i t s from a single time series using Takens' scheme. Data c o r r e s p o n d to situation in Fig. 5 C. A and B are the same o b j e c t viewed from d i f f e r e n t angles. REFERENCES I . O . A . McBryan and E.F. Van d e r Welde, H y p e r c u b e a l g o r i t h m s and implementations, Slan Journal on Scientific and Statistical C o m p u t i n g , 8 (1987) 227. 2. H. B o h r , T. Petersen, B. Rathjen, E. Katznelson and A. Nobile, C o m p u t e r Physics Communication 42 (1986) 11. 3. Texas I n s t r u m e n t s TMS 32010, User's Guide, P. S t r z e l e c k i , ed. (1983). 4. E.J. L e a r n e r , High T e c h n o l . ,

20 (1985).

5. F. Fuclto, E. M a r i n a r i , G. Parisi, C. Rebbi, et a l . , Nuclear P h y s . , B180 (FS2) (1981) 389. 6. J.13. K o g u t , R e v . M o d . Phys. 3761.

20B (1979)

7. M. P a r r i n e l l o and E. T o s c a t t l , Phys. Rev. L e t t . 49 (1982) 1165.