Nuclear Physics B (Proc Suppl ) 5A (1988) 339-344 North-Holland, Amsterdam
S I M U L A T I O N OF S T A T I S T I C A L
339
MECHANICS MODELS ON P A R A L L E L COMPUTERS
H e n r i k BOHR The T e c h n i c a l U n i v e r s i t y of D e n m a r k , D e p a r t m e n t of S t r u c t u r a l P r o p e r t i e s of M a t e r l a l s , B u i l d i n g 307, DK-2800 L y n g b y , D e n m a r k . T e l . No. 02-882488, T e l e x . 37529 d t h d t a d k .
Models t h a t c o v e r a wide r a n g e o f phenomena in s t a t i s t i c a l mechanics h a v e been implemented and e x e c u t e d on a n e w l y b u d t p a r a l l e l p r o c e s s o r system. The models are spin systems, l a t t i c e gas models and flow models w l t h a c h a o t i c b e h a w o u r . A c c o r d i n g to the r e s u l t s , it is a g r e a t a d v a n tage to e m p l o y p a r a l l e l p r o c e s s o r s to solve p r e d o m i n a n t l y s y n c h r o n o u s p r o b l e m s ~n s t a t l s t l c a l mechanics. I.
INTRODUCTION Parallel c o m p u t i n g has become a f a s h l o n a b l e
subject.
T h i s is because p a r a l l e l c o m p u t e r s aim
the s u b d i v i s i o n s becomes homogeneous.
Most
d y n a m i c a l models in p h y s i c s a r e of t h a t k i n d
at i n c r e a s i n g t h e c o m p u t e r p o w e r b y a f a c t o r o f
and e s p e c i a l l y l a t t i c e models.
10-100 to t h a t of t h e more c o n v e n t i o n a l v e c t o r
tage o f p a r a l l e l p r o c e s s i n g should be o b v i o u s m
processors.
these cases.
It is e x p e c t e d t h a t a p a r a l l e l p r o -
T h u s the a d v a n -
cessing set up can a c h i e v e t h i s goal if a l a r g e
In the f o l l o w i n g c h a p t e r s we shall p r e s e n t
set of m i c r o p r o c e s s o r s can share the c o m p u t a -
the a r c h i t e c t u r e of the p a r a l l e l c o m p u t e r system
tions in an e f f i c i e n t way and process them in
we c o n s t r u c t e d and t h e n g i v e some t y p i c a l e x -
parallel.
amples from s t a t i s t i c a l mechanics to which we
A l t h o u g h g e n e r a l system s o f t w a r e s u p p o r t ing p a r a l l e l c o m p u t a t i o n s is still l a c k i n g ,
o p e r a t i o n a l special p u r p o s e s o f t w a r e IS now I a v a i l a b l e f o r p a r a l l e l p r o c e s s i n g and c o v e r s q u i t e a l a r g e area of c o m p u t e r methods a p p l i e d to s c i e n t i f i c p r o b l e m s .
For such p r o b l e m s a
f a i r l y s y n c h r o n o u s and u n i f o r m d a t a - f l o w b e tween e v e n t u a l s u b - d i v i s l o n s of each p r o b l e m is desirable.
Most p r o b l e m s c o n c e r n i n g s l m u l a t i o n
of d y n a m l c a l systems a r e o f t h a t k i n d .
By d y -
namical systems we t h i n k of systems w i t h m u t u a l l y c o u p l e d elements t h a t can be d e s c r i b e d b y a set of c o u p l e d d i f f e r e n t i a l e q u a t i o n s .
They
are s o l v e d n u m e r i c a l l y on c o m p u t e r s b y , e . g . the f l m t e element m e t h o d , and m terms o f p a r a l lel p r o c e s s i n g a s u b d i v i s i o n o f t h e f i n i t e set of elements can be h a n d l e d b y each m i c r o p r o c e s s o r . Parallel p r o c e s s i n g is s t r a i g h t - f o r w a r d
a p p l i e d the c o m p u t e r .
fully
if the
c o u p l i n g b e t w e e n the elements of such a d y n a mlcal system is local and the d a t a f l o w b e t w e e n
0920 5632/88/$03 50 ~(J Elsevier Science Publishers B V (North-Holland Physics Pubhshmg Division)
2. THE A R C H I T E C T U R E PROCESSOR
OF A P A R A L L E L
We shall in t h e f o l l o w i n g d e s c r i b e the p a r a l lel p r o c e s s o r system,
PALLAS,
f o r s i m u l a t i n g o u r models.
we h a v e used
PALLAS ( P a r a l l e l
L a t t i c e S i m u l a t o r ) was b u i l t at t h e T e c h n i c a l U n i v e r s i t y of D e n m a r k d u r i n g a straightforward
1985, and it has
p a r a l l e l d e s i g n as a h y p e r -
cube in 4 d i m e n s i o n s . The p a r a l l e l c o m p u t e r system PALLAS 2 (see Fig.
I) b a s i c a l l y c o n t a i n s f o u r e l e m e n t s : The
C e n t r a l Processing U n i t CPU t h a t processes the d a t a , the f r o n t - e n d
c o m p u t e r (an IBM PC) t h a t
reads in and reads o u t d a t a , a c o l o u r m o n i t o r and a p l o t t e r . unlt,
The i n t e r e s t i n g p a r t ,
the CPU
is b u i l t up of 16 equal p r o c e s s o r c a r d s
arranged m a square lattice with nearest neighbour connections.
F u r t h e r m o r e , t h e 16 proces
H Bohr / Szmulatzon of statzstmal mechamcs models
340
sor c a r d s are connected to the IBM PC e q u i p ped with a specsal i n t e r f a c e c a r d .
Finally,
the
The communication n e t w o r k basically consists of nearest n e i g h b o u r buses and a global bus
system contains a clock g e n e r a t o r t h a t d e l i v e r s
t h a t connects the f r o n t - e n d computer to all the
s y n c h r o n o u s clock signals of 20 MHz to all the
processor cards.
processor cards.
The t o p o l o g y of the 2-dimen-
sional p r o c e s s o r n e t w o r k is a t o r u s since the b o u n d a r y p r o c e s s o r c a r d s on the h a r d w i r e d
PALLAS
lattice are connected to each o t h e r in the same row or column. IBM
The local communication is g o v -
e r n e d b y communication p o r t s (4 for each TMS 320) and the ICs contain a couple of 16-b=t latches t h a t store the incoming and o u t g o i n g data, r e s p e c t i v e l y , thus p r o v i d i n g a full d u p lex communication path between p r o c e s s o r s . The status of the p o r t s ( e m p t y / f i l l e d )
is a v a d -
able to the s o f t w a r e , b u t the p o r t s cannot i n t e r r u p t the p r o c e s s o r s . FIGURE I The PALLAS system. T h e C P U - p a r t , the t o w e r , contains 16 p r o c e s s o r c a r d s each d o i n g 5 million 16 b i t TMS320 i n s t r u c t i o n s per second.
The clock g e n e r a t o r d e l i v e r s s y n c h r o n o u s signals to all p r o c e s s o r c a r d s so the communication flow can become f u l l y s y n c h r o n i s e d . C o n c e r n i n g o t h e r parallel c o m p u t e r systems
Each p r o c e s s o r c a r d (see Fig. 2) consists of 3 a TMS 320 m i c r o p r o c e s s o r from Texas I n s t r u -
the IntelTs IPSC c o m p u t e r 4 is the f i r s t parallel
ments, communication p o r t s to the f o u r nearest
it belongs to the class of MhvtD (/Vlultiple I n -
p r o c e s s o r to be w i d e l y m a r k e t e d .
Like PALLAS
n e i g h b o u r s , a p o r t to the mare b u s , a h a r d w a r e
s t r u c t i o n M u l t i p l e Data) computer and ts based
noise g e n e r a t o r , a d i s p l a y c i r c u i t ,
on the design of a h y p e r cube n e t w o r k with
a tabel-
eprom, memory (4 K b y t e d i s t r i b u t e d on a user
128 processors w o r k i n g in p a r a l l e l .
ram and system ram) and f i n a l l y a b a t t e r y back
l a r g e s t e d i t i o n p e r f o r m s up to 10 MFLOPS
up.
(comparable to PALLAS' speed of 80 MIPS).
The memory ts hmtted b y the a d d r e s s i n g
c a p a c i t y of the p r o c e s s o r s .
The
Many s c i e n t i f i c programmes have been successf u l l y implemented on the Intel h y p e r cube, b u t one serious d r a w b a c k has been the slow e t h e r
TAgLE
C~OS
net communication and the r e l a t i v e l y slow p r o -
R^M
cessor nodes. We are in the process of c o n s t r u c t i n g a TNS320
parallel c o m p u t e r system [ F i g .
3) based on a
h y p e r cube n e t w o r k in 6 dimensions (64 nodes) and w i t h a global optical n e t w o r k c o n n e c t i n g all p r o c e s s o r nodes to a host c o m p u t e r . local n e t w o r k ts based ol duel p o r t d e s i g n .
The
a shared memory
We have d e s i g n e d each node
capable of 40 MFLOPS in peak and with a dis FIGURE 2 PALLAS p r o c e s s o r c a r d o v e r v i e w .
t r l b u t e d memory of 8 Mbyte at each node. Each p r o c e s s o r node can w o r k s e p a r a t e l y as an
H Bohr / Strnulatlon of statlsttcal mechamcs models'
a c c e l e r a t o r to a normal PC.
341
n o u s l y to all p r o c e s s o r s .
The corresponding
programmes are obviously
suited for a parallel
set-up.
Predmomnantly
t h e y u t l h z e t h e local
commumcation network and run very quickly. We chose t h e p l a n a r x - y
s p i n model f o r t h i s
case a n d will b r , e f l y p r e s e n t its i m p l e m e n t a t i o n on P A L L A S . The x-ylattice
model has the f o l l o w i n g a c -
tion in t h e Wilson f o r m .
s : ~,z{cos(fi-
~,+x ) +
C O S ( - (~i + ( ~ l _ x )
PROCESSORMODULEw TH CPU ~ND PORTS
FIGURE 3 T h e 64 node h y p e r c u b e in 6 d i m e n s i o n s . Each p r o c e s s o r node has 6 c o m m u n i c a t i o n l i n e s .
+ c o s ( ~ i - ~i+y)
+ cos(-~i ÷ ~i_y)l,
(I)
w h e r e c~i a r e t h e a n g u l a r v a r i a b l e s of each l t h J
s p i n a n d t h e , n d i c e s a r e a r r a n g e d on t h e l a t t i c e As t y p i c a l a p p l i c a t i o n s t h e l a t t l c e models can e a s , l y be p r o g r a m m e d on a m a c h i n e , lattice structure
of P A L L A S ,
with the
In t h i s case each p r o c e s s o r
r u n s a r e p h c a of t h e same p r o g r a m m e ,
and the
coordmatlon
is r e -
of t h e p r o c e s s o r ' s a c t i v i t y
d u c e d to t h e p r o b l e m of s h a r i n g t h e d a t a r e l a t e d to t h e b o u n d a r i e s of t h e p a r t i t i o n s cal p r o b l e m .
and y mdicating
simply by partition-
ing t h e p h y s i c a l l a t t i c e a n d a s s i g n i n g each p a r t to a p r o c e s s o r .
w l t h t h e i n d e x i l a b e l l m g each c r o s s a n d w i t h x
in t h e p h y s i -
We must s t r e s s t h a t r u n n i n g
the two directions.
We h a v e used t h e L a n g e v i n a p p r o a c h 5 to a n a l y s e t h i s model w h i c h means t h a t we have to solve the following first
order differentlal
e q u a t i o n system d~ dt
c~S = - -+ rli, 3(~
(2)
the
same p r o g r a m m e s on all p r o c e s s o r s does n o t
w h e r e t h e noise f u n c t i o n
automatlcally imply running
t i o n s t h a t lead to a d e s i r e d p r o b a b i l ~ t y d i s t r i -
t h e same i n s t r u c t i o n s
as m a v e c t o r p r o c e s s o r b e c a u s e t h e e x e c u t i o n can d e p e n d on t h e d a t a .
bution for the fields
rl t m v o l v e d f l u c t u a -
~i
T h e mam d i f f e r e n c e
b e t w e e n t h e e x a m p l e s w h i c h we d i s c u s s below is e x a c t l y r e l a t e d to t h i s p o i n t .
P(@i ) = e x p ( - S ( @ i ) I o ) . T h e no~se f u n c t i o n s
(3)
q~ a r e G a u s s i a n r a n d o m
v a r i a b l e s n o r m a l i z e d to t h e w l d t h 3. A S Y N C H R O N O U S P A R A L L E L L A T T I C E UP OF THE x - v MODEL
In t h i s s e c t i o n we will p r e s e n t t h e d a t a from a typical synchronous
o by
SETTli(t)~j(t')
= 2~6q6(t
t').
(4)
In o u r case t h e d i f f e r e n t i a l
e q u a t i o n s in e q .
l a t t i c e s i m u l a t i o n t h a t was
performed m parallel by PALLAS. c h r o n o u s p r o g r a m m e we mean
By a syn-
a programme ex-
(2) a r e s o l v e d n u m e r i c a l l y on P A L L A S b y s o l v -
e c u t e d Jn such a w a y t h a t all p r o c e s s o r s a r e
i n g t h e f o l l o w i n g d i f f e r e n c e e q u a t l o n system
transferring
iteratively
t h e same a m o u n t o f d a t a b e t w e e n
each o t h e r in e q u a l t~me i n t e r v a l s . chronous procedure clock generator
Such a s y n -
is e a s i l y c o n t r o l l e d b y a
t h a t sends o u t s i g n a l s s y n c h r o -
c~t new = c~t old
+
H Bohr / S t m u l a t t o n of stattsttcal mechamcs models
342
d a t a on one p a r t i c u l a r
i n p u t p o r t of a c e r t a i n
processor. + Z ~ - I s i n ( ~ i o l d - ~ i + j o l d ) At +rl i.
(5)
J
T h e i n d e x j in t h i s case r e p r e s e n t s
the nearest
n e . g h b o u r s p i n s to t h e i th s p i n (4 to each isite).
In o u r case each p r o c e s s o r t o o k c a r e o f
an a r r a y o f 7 x 7 s p i n s so a l t o g e t h e r t h e l a t -
We managed t h e s e p r o b l e m s b y a h a r d w a r e h a n d s h a k e on each p o r t t h a t c o u l d d e l a y new d a t a in a " w o u l d b e " b o t t l e n e c k a n d t o g e t h e r with software precautions could avoid any problems of o v e r f l o w . We chose a m o l e c u l a r l a t t i c e gas model 7 to
tice size was 16 x 7 x 7.
r e p r e s e n t such a s y n c h r o n o u s
programmes that
s h o u l d w o r k as t e s t p r o g r a m m e s on t h e
0.6 Cv: d 0.5 -
P A L L A S c o m p u t e r b e s i d e s t h e wish to g a i n more
d'~
i n s i g h t ~n t h e model b y h l g h - s t a t ~ s t i c s
0.4
tions.
Actually
t h i s l a t t i c e gas model was not
t h e most g e n e r a l a s y n c h r o n o u s
0.3
simula-
programme that
c o u l d be t h o u g h t of since t h e a m o u n t of data 0.2
exchanged and their order are fixed and only the arrival
0.1
time is u n d e f i n e d .
T h e model is an
Ising type theory with the following bond I
l
I
I
I
0.5
I
I
I
I
1.0
I
1.4
:1,!3
Hamiltonian H + ~N = - J
FIGURE 4 T h e s p e c i f i c heat as a f u n c t i o n x - y model.
Z
I-[ k(i)(l-Ck)CiCj
2(ij)
of [3 in t h e x n (I - c ) l(l)
In Flc5.
4 we p r e s e n t t h e c u r v e of t h e s p e -
cfflc heat as a f u n c t l o n peak a r o u n d
The curve
13 = I. 04 c o r r e s p o n d i n g
order phase transition a t u r e 6.
of 13.
to a second
well k n o w n in t h e l i t e r -
A t e s t of t h l s p r o g r a m m e ( f u l l y
i z e d ) o n C R A Y XMP g a v e r o u g h l y cution
has a
vector-
t h e same e x e -
speed.
T h e P A L L A S c o m p u t e r s y s t e m was o r i g i n a l l y designed for synchronous
programmes.
It
t h e r e f o r e c a u s e d some p r o b l e m s to r u n a s y n chronous programmes, volve a different ferred
~.e. p r o g r a m m e s t h a t i n -
set of i n f o r m a t i o n to be t r a n s -
b e t w e e e n t h e p r o c e s s o r s in a g i v e n time
t h e a v e r a g e p a r t i c l e n u m b e r ~N). Furthermore, a lattice,
k(j)
i, j a r e n e t g h b o u r m g
a r e all f i r s t
i which are different
sites of
n e i g h b o u r s of sites
from j a n d s i m i l a r l y
n e i g h b o u r s of j d i f f e r e n t
l(j)
from i.
T h e t w o p r o j e c t o r s 11(I - Ck, I) e n s u r e t h a t a b o n d c o n n e c t i n g t w o n e i g h b o u r ~ n g p a r t i c l e s wdl pay energy
- J o n l y if each o f Lhe p a r t i c l e s I n -
v o l v e d h a v e no o t h e r b o n d s . a n e x t to n e a r e s t n e i g h b o u r
T h i s f a c t implies interaction.
The
s i m u l a t i o n is c a r r i e d o u t u s i n g a s t a n d a r d M e t r o p o l i s Monte C a r l o a l g o r i t h m . contains
(see r e f .
crystal
sors.
atomic f l u i d p h a s e .
Thas means t h a t if no p r e c a u t i o n s a r e
(6)
w h e r e c s t a n d s f o r an o c c u p a t i o n n u m b e r ,
i n t e r v a l a c c o r d i n g to t h e p o s i t l o n of t h e p r o c e s -
t a k e n d a t a m i g h t be r e a d on t o p of c u r r e n t
,
i
c ~ { 0 , I } a n d a chemical p o t e n t i a l ~ c o n t r o l h n g
a r e all f i r s t
4. AN A S Y N C H R O N O U S P A R A L L E L L A T T I C E S E T - U P OF A M O L E C U L A R GAS
- ~:c
i
2,7)
4 phases
T h e model A molecular
p h a s e , a m o l e c u l a r f l u i d phase and an
H Bohr / Slmulatton of stattstlcal mechanics models
5. A FLOW MODEL WITH A C H A O T I C BEHAVlOUR
ttons d e s c r i b i n g each n e p h r o n are solved as d i f f e r e n c e equations b y the f o r w a r d Euler num-
N e x t we have chosen an i n t e r e s t m g s t a t t s t l cal mechanics model m b i o p h y s i c s .
The model 8
d e s c r i b e s a coupled system of f u n c t i o n a l u n i t s (nephrons)
m the k i d n e y .
The model ts basic-
a l l y a h y d r o d y n a m i c a l model and its solution r e q u i r e s a large amount of c o m p u t e r power b u t can easily be simulated on the parallel computer PALLAS.
343
In the model as well as in the e x p e r l -
mental data t h e r e are s t r o n g i n d i c a t i o n s of v e r y complex chaotic phenomena a r i s i n g from more
erical method w i t h a small time step. Results of the simulation 9 are shown in Fig.
5 where the t u b u l a r p r e s s u r e in one ne-
p h r o n out of 256 e x h i b i t s an o s c d l a t o r y b e h a v iour.
The f i r s t c u r v e of no c o u p l i n g to n e i g h -
b o u r s shows r e g u l a r o s c i l l a t i o n s , while the second c u r v e with a weak c o u p l m g shows ~rregular oscillations.
F~gure 6 shows the c o r r e s -
p o n d i n g phase p o r t r a , t s of the simulated p r e s sure oscillations with an a t t r a c t o r .
simple chaotic a t t r a c t o r s found in the dynamics of the single, u n d i s t u r b e d n e p h r o n .
The simu-
lations are based on a 2-dimensional g r i d of 16x16 n e p h r o n s w i t h n e a r e s t - n e i g h b o u r c o u p l i n g . A n e p h r o n is b a s i c a l l y a long t u b u l e w i t h the
G 17"51
u p p e r (closed) end c o n t a i n i n g 20-40 c a p i l l a r y
PT mm Hg
II
15o0
loops a r r a n g e d m p a r a l l e l . A simple, dynamical model of the p r e s s u r e and flow r e g u l a t i o n in a n e p h r o n is g i v e n b y 8 c o u p l e d 1'order d i f f e r e n t i a l e q u a t i o n s .
model emphasizes the f u n c t i o n of the t u b u l o g l o merular feedback loop.
/ 7.5/ 0
The
As a r g u e d in r e f .
8,9
. 1
.
. 2
TIME(ram)
. 3
4
®
the slow oscillations m t u b u l a r p r e s s u r e r e p r o duced m Fig.
5 are b e l i e v e d to arise from
temporal s e l f - o r g a n i z a t i o n m t h a t p a r t i c u l a r loop.
©
Pa = 100 mm Hg
COUPLING = 0
Pa = 100 mm Hg
COUPLING = 0.2
Due to the v e r y dense p a c k i n g of n e p h r o n s m the k i d n e y , one can e x p e c t a c e r t a i n degree of i n t e r a c t i o n between n e i g h b o u r i n g n e p h r o n s . In the slmulatton the n e p h r o n system is a r r a n g e d in a 2-dimensional lattice w i t h 16x16 nodes each r e p r e s e n t m g a single n e p h r o n coupled to the nearest n e i g h b o u r s in all d i r e c t i o n s . Each of the 16 p r o c e s s i n g units on PALLAS simulate a small 4x4 s u b - s y s t e m of the l a t t i c e . T h i s is done in parallel w i t h r e s p e c t to time and w~th the help of s y n c h r o n o u s clock signals. U p - d a t , n g of the common b o r d e r s c o n s t i t u t e s the essential mformat~on flow between the p r o cessor u n l t s .
The 8 coupled d i f f e r e n t i a l e q u a -
FIGURE 5 A. E x p e r i m e n t a l r e s u l t s showing osc,llattons m the p r o x i m a l i n t r a t u b u l a r p r e s s u r e PT inside a rat k i d n e y . B. Slmulatton r e s u l t s showing the p r e s s u r e PT in a single n e p h r o n out of 256 n e p h r o n s with no nearest n e i g h b o u r c o u p l m g . C. Similar results as m B b u t wlth a weak c o u p l i n g . Often oscillations become much more i r r e g u l a r than shown.
H Bohr / Stmulatton of stattsttcal mechamcs models
344
8. K . S . Jensen, E. Mosekilde and N.H. H o l s t e i n Rathlou, Mondes en Developpement, Tome 14, No. 53 (1986).
Q
9. K . S . Jensen, H. B o h r , N.H, H o l s t e m Rathlou, T. Petersen and B. Rathjen Proceedings of the Workshop on Statistical Physics, p. 137, ed. H. Bohr and O . G . M o u r i t s e n , DTH, L y n g b y , Denmark (1987).
® ,~
-.
Y FIGURE 6 R e c o n s t r u c t i o n of 3-dimensional phase p o r t r a i t s from a single time series using Takens' scheme. Data c o r r e s p o n d to situation in Fig. 5 C. A and B are the same o b j e c t viewed from d i f f e r e n t angles. REFERENCES I . O . A . McBryan and E.F. Van d e r Welde, H y p e r c u b e a l g o r i t h m s and implementations, Slan Journal on Scientific and Statistical C o m p u t i n g , 8 (1987) 227. 2. H. B o h r , T. Petersen, B. Rathjen, E. Katznelson and A. Nobile, C o m p u t e r Physics Communication 42 (1986) 11. 3. Texas I n s t r u m e n t s TMS 32010, User's Guide, P. S t r z e l e c k i , ed. (1983). 4. E.J. L e a r n e r , High T e c h n o l . ,
20 (1985).
5. F. Fuclto, E. M a r i n a r i , G. Parisi, C. Rebbi, et a l . , Nuclear P h y s . , B180 (FS2) (1981) 389. 6. J.13. K o g u t , R e v . M o d . Phys. 3761.
20B (1979)
7. M. P a r r i n e l l o and E. T o s c a t t l , Phys. Rev. L e t t . 49 (1982) 1165.