Computer Physics Communications 26 (1982) 423—431 North-Holland Publishing Company
423
SIMULATION OF PARTICLE PROBLEMS IN ASTROPHYSICS R.A. JAMES Deparrment of Astronomy, University of Manchester, Manchester, M13 9PL, UK
Although particle-mesh simulation techniques have proved extremely valuable in the study of galactic evolution, the high cost of the calculations on serial computers and the limited resolution which can be employed have severely restricted applications. The vector and array processors now available offer substantial improvements. The CRAY- IS computer provides a speed improvement by a factor between 7 and 10 as compared with the CDC 7600. A substantial amount of program revision is needed to obtain this improvement but the basic algorithms used are unchanged. Use of the ICL DAP involves substantial changes in the algorithms employed. This machine also offers considerable improvements over serial machines. Both the CRAY- IS and the ICL DAP allow us to run reasonably realistic simulations at acceptable cost.
1. Introduction Astrophysicists use particle simulation techniques to study the dynamics of single galaxies and systems of a few galaxies, and also to follow the development of galaxy clustering in the early Universe. The method is particularly important in stellar dynamics where theoretical predictions cannot be checked by experiment. Many compromises have to be made in designing simulations, and they are not a completely satisfactory substitute for experiment. However, the physics we put into our models is simple and well understood. Their behaviour tells us about the physics of poorly understood collective interactions without allowing our prejudices to determine the outcome. It seems legitimate to use them as a partial substitute for experiment. This is not the method’s only justification. We have complete information about the behaviour of individual particles in our models and may sample them to study in detail the mechanisms involved in generating bar and spiral structure in disc galaxies and maintaining non-symmetric structure in elliptical systems. We may also study in detail the response of galaxies to encounters and collisions, Such information cannot be obtained observationally. Even if it were possible to carry out laboratory experiments it would not be easy to recover.
The calculations are expensive but the new generation of vector and array processors promises to reduce their cost and thus permit more realistic investigations than have been possible in the past. In section 2 of this paper we outline the partide-based techniques currently used in astrophysics. Section 3 discusses a number of applications and shows how desirable it is to improve resolution and increase problem size. Section 4 describes some of the features of our current simulation code for the CRAY- 1S, and section 5 discusses the revisions which are being implemented on the ICL DAP at Queen Mary College. Section 6 is a summary of our conclusions.
2. Numerical techniques In a particle simulation we represent the system by several thousand model particles and follow their motion in their own gravitational field. This may be supplemented by contributions from companion systems or massive haloes. Two versions of the method are currently used in astrophysics. One, developed by Aarseth and his co-workers, follows the motion of each particle with a variable time step chosen to match the particle’s circumstances. At each step we determine the gravitational field due to the rest of the system, interpolating in time as required. The numerical work
001 0-4655/82/0000—0000/$02.75 © 1982 North-Holland
424
R.A. James
/
Simulation of particle problems in astrophysics
required is proportional to the square of the number of particles, so simulations using more than a few thousand are not economic. Despite this the method has many merits. In particular we note that there is no need for an auxiliary mesh, and we can conveniently follow (for example) galactic encounters in which the participants are widely separated for much of the time. However, the author has not heard of any adaption of this technique for vector processors, and we will not discuss it further here, The alternative procedure is that developed by Hockney, Hohl, Miller, Prendergast and many others. The time step is the same for all the particles and does not change during the calculation. At each time step we assign the masses of the particles to an auxiliary mesh, using one of a number of systematic allocation schemes. We use a direct potential solver based on the fast Fourier transform to determine the potentials at mesh points. This enables us to find the gravitational field at mesh points and interpolate for the field seen by individual particles. If we use the same interpolation weights for mass assignment and force interpolation we conserve linear momentum to machine accuracy. We stagger the times at which we hold positions and velocities, to improve the accuracy of the time integration and to make the procedure time-reversible [1]. This technique applies to systems containing particles only. It is appropriate when the mean free path against binary encounters is long, and the force field seen by any particle is the fairly smooth field due to the rest of the system. In other applications special procedures are used to improve the contribution to the field from nearby particles. This is not necessary in stellar dynamics as the binary relaxation time for stars in a typical galaxy greatly exceeds the age of the Universe. Galaxies contain gas as well as stars, and a number of attempts have been made to include this gas in the models. Unfortunately our picture of the physical state of the interstellar medium is extremely uncertain, and the author feels it is a little premature to incorporate large scale hydrodynamic calculations into particle-mesh models at present. It does seem useful to explore the purely gravitational effects that the model does include however.
3. The need for large simulations The accuracy of a simulation is affected by the resolution of the mesh used for potential determinations and by the number of particles involved. We note that fine meshes require the use of a sufficiency of particles. A fine mesh is of little use if sampling errors become important in the potential values. We usually smooth interparticle forces over one or two mesh intervals. Reducing the mesh interval enhances unwanted binary effects unless we increase particle number to cornpensate. Sellwood [2] has compared growth rates of unstable modes for a galactic disc with theoretical predictions due to Kalnaj s and Toomre. His results are at present confined to “cold” discs, in which the initial particle velocities are those required for circular orbits. He examined the growth rates of a particular rapidly growing mode and found them to vary by factor of 2.5. The rate depends mainly on the particle number, but the variation is not systematic. His largest simulation, with 160000 particles, gave a growth rate differing from the theoretical prediction by 15%. The errors seem to arise from Poisson noise in the particle distribution. A careful choice of initial particle positions greatly improves the situation, with 3% errors in the growth rate for as few as 20000 particles. This is a valuable and encouraging test of the model. However, it seems unsafe to condude that we only need 20000 or so particles. It is not yet clear how “warm start” simulations (in which we add a random component to the initial circular velocities) will behave. A further question is how smooth the initial density distribution should be. Sellwood’s results may simly mean that our models should reproduce the initial distribution rather accurately. This requires large numbers of particles for a lumpy distribution, and raises difficult questions concerning the choice of starting conditions. It is instructive to look at some typical problems to see why they require the spatial resolution available on machines like the CRAY- 1 S or the DAP.
R.A. James
/
Simulation of particle problems in astrophysics
3.1. Disc galaxy simulations Many disc galaxies exhibit bars and/or spiral structure with approximate two-fold rotational symmetry. To investigate the generation and maintenance of such patterns we must ask how the stars in the disc respond to gravitational perturbations due to the pattern. We find that there are three widely separated resonances. Fig. 1 shows a sample of orbits seen in a frame rotating with a pattern consisting of a central bar, The rotation curve for a typical disc galaxy is nearly flat over a wide range of radii, with linear velocities in the range 200—300 km s_I, so the angular velocity decreases with radius. At some radius stars in circular orbits co-rotate with the pattern. Such an orbit is marked by the point labelled CR in the figure. We now consider a star in a smaller non-circular orbit, labelled I in the figure. The orbit, as a rule, has no simple relation to the pattern. The apocentre longitude advances relative to the pattern between successive apocentres. An interesting situation arises for the orbit labelled ILR, where the longitude advance is 180°so that the orbit is closed. It is approximately an ellipse centred on the origin and rotating with the pattern. With certain rotation laws, Lynden-Bell [3] has shown that the orbit can be trapped by the bar so that the apsidal line oscillates about it. This is part of the mechanism that establishes and maintains bars. Stars with this type of relation to the pattern are
CR
I R
R
425
said to be in inner Lindblad resonance with it, whether or not their orbits line up with the bar. There is also an outer Lindblad resonance. The apocentre longitude of a star outside the co-rotation region decreases relative to the pattern between apocentres. If the decrease is 180°we obtain an orbit of the type marked OLR in the figure. Such orbits usually try to align themselves with apsidal line at right angles to the bar. The response of the disc to the pattern is dominated by the Lindblad and co-rotation resonances. The corotation and outer Lindblad resonances are likely to be relevant to most observed spirals, while the inner resonance is probably involved in bar formation. For nearly circular orbits the resonances are narrow and a flat rotation curve implies that their radii have ratios 5.8:3.4: 1. In practice these ratios are distorted by deviations from flatness. In addition the resonances are considerably broadened by non-circular motion and by the effect of the asyrnmetric field of the bar. However, it is clear that the physically significant regions are inconveniently well separated. In fig. 2 we see some pole-on views of a late stage in a simulation performed on the CDC 7600 [4]. Analysis of a sample of orbits shows that the bar is formed according to Lynden-Bell’s mechanism. The leading and trailing structures (not predicted by the theory) arise from collective oscillations of the apsidal lines of the orbits and are only just resolved by the mesh. The co-rotation and outer Lindblad resonances are outside the region of appreciable surface density the disc had to be cut off to fit on the mesh. The bar length is about 12 mesh intervals, and there is no scope for reducing it relative to the mesh on the CDC 7600. Experiments to confirm this result at higher resolution and to explore models with an active outer Lindblad resonance being run on the CRAY-iS but the results have not yet been
3.2. Elliptical galaxies BAR
Fig. I. Orbits in a disc galaxy, seen in a frame rotating with a central bar. The arrows show the direction of motion in the rotating frame. The labels are defined in the text,
These are of course ellipsoidal
—
the term ‘el-
liptical’ is common usage even though it is not really appropriate for three-dimensional objects. A major problem concerns the maintenance of the
R.A. James / Simulation of particle problems in astrophysics
426 RUN 531
TIME
1800
RUN
531
TIME
,~
I.
RUN
531
RUN
TIME
1900
RUN
TIME 2050
‘~
~
1950
RUN 531
;.~
_?
TIME
1850
5’,.
531
TIME
2000
531
~
ç ~
RUN 531
TIME
2100
RUN
531
TIME
2150
RUN 531
TIME
2200
RUN 531
TIME
2250
RUN
531
TIME
2300
RUN
TIME
2350
-
~
531
~
-
:~-
Fig. 2. Behaviour of a model of a disc galaxy. The times are given in units of 4.03 >~,106 years. The galaxy rotates clockwise.
ellipsoidal form over a Hubble time. Binney [5] argues that anisotropic velocity distributions may develop during formation and be capable of maintaming the asymmetry. The argument is supported by a 500 particle simulation due to Aarseth and
Binney [6] and by Schwarzschild’s [7] self-consistent model. Wilkinson and James [8] used a 333 mesh and 25000 particles to follow two models for 4 X lO~years and found that the velocity anisotropy persists indefinitely. The orbits fall into a few
R.A. James / Simulation of particle problems in astrophysics
simple types related to the structure of the system. This model is severely resolution limited and the density profile near the centre is less sharply peaked than in observed systems. It would be possible to fit a 65~mesh into the CRAY-iS at Daresbury, but the particle storage and auxiliary work space required would be too extensive with the present code. We intend to revise the handling of some of the particles and possibly use half-word packing for the mesh values to overcome this problem.
427
fer requirements by eliminating potential determinations in empty regions. It may make the calculation feasible on the Daresbury machine.
4. Particle-mesh simulation on the CRAY-iS
The most extensive current simulations of galactic collisions are those of Miller and Smith [9]. They were performed on a 64~ mesh with about 100000 particles, using the ILLIAC IV machine. The necessity of fitting both participants into the mesh at the beginning of the calculation re-
The discussion of this section refers to the code which we have developed from a version written for the CDC 7600 at UMRCC [13,14]. In adapting the code we found that the changes required in the potential solver were fairly minor, but that a fair amount of reorganisation was necessary to exploit the CRAY vector facilities in the particle handling section. The largest single component of the work of the potential solver lies in the Fourier transformation routines. The procedure is due to Ziegler [15], and requires about 2.5N log2N operations for a trans-
stricted the study to deeply penetrating collisions, Such a calculation would not have been a reasonable proposition on a CDC 7600 or an IBM 195. A larger mesh than Miller and Smith’s would be needed to study glancing collisions or multiple systems. The interest of such work is made clear by, for example, the recent observations of Appleton, Davies and Stephenson ~l0] on the M8l/M82 group. This is a cluster of 4 galaxies embedded in a large and dynamically complicated hydrogen cloud. Bridges of emission which connect M8 1 to its three companions, together with various other features, seem to arise from tidal interactions. The extremely large meshes needed to model such systems could be provided by using the algorithm of Burridge and Temperton [11], which requires memory for only a few mesh planes at a time. The computational demands would, however, be heavy, Adequate resolution within the galaxies would require at least a 257~mesh (about 1.7)< lO~points). Fourier transformation on the CRAY- lS would take over 20 s per time step, and to this we must add an allowance for finding potentials in transform space, and for a surface convolution to give the boundary potential [12]. The file transfer waiting time may be a problem as the Daresbury machine lacks an I/O subsystem. The mesh segmentation algorithm [4] discusssed in the next section would reduce computational and file trans-
form of length N. On the CRAY-i S we hold a mesh of dimensions N1, N2, N3 in a one-dimensional array in memory, with element (r, s, t) at displacement (rN2 N3 + sN3 + t) from element (0, 0, 0). When transforming in the r (or t) direction the increment between corresponding elements of successive transforms is 1 (or N3) and there are N2N3 (or N1N2) independent transforms. The inner loop is a vector loop over the transforms so that we combine entire planes of values at each folding stage, obtaining excellent vector performance. The increment for transformation in the s direction is usually 1, but becomes (N2 1) N3 + 1 at the end of a line. This forces us to combine lines rather than planes when performing this stage of the transformation. The average result rate over all the transforms is about 100 Mflops for the Fourier transformation, a gain of a factor of 25 over the CDC version of the code. Other parts of the potential solver proved more difficult to adapt, essentially because they had been optimised for a different architecture. At the conclusion of this stage the potential solver ran with a speed factor 9.25 over the CDC 7600. Subsequent changes in the CFT compiler increased this factor to 10. Before we modified the code it was less than 2. The main problem in the particle handling arises from the mass assignment and force interpolation
-
3.3. Collisions and multiple systems
—
428
R.A - James
/
Simulation of particle problems in astrophysics
procedure. These require that each particle sends information to nearby points of the mesh and receives it from the same points. We divide the region into cells defined so that all particles in a cell talk to the same mesh points. Efficient vector operation requires that we assemble all the partides for a cell into continguous areas in memory, and that each cell contains a reasonable number of particles. The second requirement cannot be met. Galaxies contain large density contrasts and many cells contain only a few particles. Fortunately the organisation of the CDC 7600 code allowed us to avoid this problem to some extent. The memory on the UMRCC machine is too restricted to hold all the particles in even a small simulation, so we keep them in a serial file, The cells are grouped into square blocks, one cell thick and four or eight cells to a side. The particles belonging to a block form a record on the file, Only one block is held in memory at a time, and most of its particles remain resident when pushed. Those which wander into other blocks are stored in a wanderer area. Those which wander into blocks waiting to be processed are merged with the residents when the block is cleared to file. The remainder are picked up on a subsequent pass through the file. We use the conventional link-list procedure to connect residents to their new cells and wanderers to their new blocks, The procedure allows simulations requiring much more particle storage than is available in memory. It also improves vector performance on the CRAY. Of the arithmetic required for particle pushing, only the summing of mass contributions, and the multiplication of weights by forces at mesh points, need be done at cell level. The actual weights depend only on the position of the particle relative to the cell boundaries. Their determination is a major part of the particle calculation, and can be done at block level. Similarly the positions and velocities can be incremented at block level, Some of the arithmetic and organisational loops are sufficiently complicated to force the compiler to write intermediate results to memory. This degrades the performance. One example is the subroutine which moves particles after velocity incrementation, revises their cell and block allocations, marks stars which wander off the mesh and
links the others back into the block if they are still resident or into the wanderer list if they are not. An Assembly Language alternative does not need to store intermediate results. On a test problem (17 X 332 mesh points and 25000 particles) the speed ratio between the Assembly Language and Fortran versions was 2.7. The CPU time saved is a little over 10% of the particle handling time, This is not a consequence of a failure to vectorise the Fortran routine. The only loop not vectorised by CFT is the re-linking loop which is of necessity scalar coded in the CAL version. Assembly coding in this and in eight other routines makes an important contribution to the performance. We may compare the time required for identical runs on the CRAY- 1 S and the CDC 7600 for a small problem. In the case above the speed ratio is 6.8. We estimate that a CDC 7600 with sufficient memory to accommodate the problem would take 12 s per step for a mesh of 9 X 652 points and 100000 particles, or 14,5 s with 150000 particles. The measured CRAY-is times are 1.096 and 1.398 s per step, respectively. While this is not a genuine comparison between measured times, it does give an indication of the gains possible with a vector architecture. It should be noted that the times quoted are for ordinary time steps in which very little analytical work is done. At every tenth time step we extract information for subsequent analysis at UMRCC. This is fairly costly, and the time taken for 10 steps is 13.2 s with 100000 particles, or 17.2 s with 150000. The comparisons above take no account of the input and output waiting times on the Daresbury CRAY. Double buffering and careful organisation on the CDC 7600 give 80% utilisation of the CPU on that machine. The CRAY code packs star information from the 7 words (6 coordinates and one link/tag word) used internally to 3 words for the file representation. It uses the BUFFER IN and BUFFER OUT statements as the basis of a double buffering system. Nevertheless the current situation is less than satisfactory. Steps are being taken to halve the transfers required on the partide information file, but it is not possible at present to estimate the likely gain. This overhead is not likely to be a problem on a machine with an
R.A. James
/
Simulation of particle problems in astrophysics
I/O subsystem. It is however important to make improvements for the Daresbury machine, as we have no interest in running problems small enough to share memory with other jobs. At present the code is used for problems with up to 150000 particles and 17 X 652 mesh points, This gives valuable gains in resolution over the CDC 7600 version of the code, Our future plans are as follows: i) We intend to revise the wanderer handling so that stars wandering into a block that has already been processed are sent to a temporary file, to release space for other purposes. This should allow us to use up to 250000 particles on the Daresbury machine, and possibly to use meshes of 9 X 1292 points, ii) To introduce an option to pack two potentials into a single word. This would not entail any loss of accuracy compared with the present code, which already packs particle coordinates. In conjunction with (i) this should enable us to run elliptical galaxy simulations on meshes of 65~points, iii) The current code scans the particle data twice in a time step. These two scans will be merged into one to save input and output. iv) The algorithm of Burridge and Temperton would not be compatible with the convolution process for potential determination that is frequently used in galactic simulations. It is however compatible with our corrective charge algorithm [12]. In section 3.3 we concluded that it is too expensive for meshes of 257~points. However, it looks most attractive for smaller meshes. In simulating an ellipsoidal system it would permit use of a mesh of l29~points, with a further increase in resolution. For a disc galaxy we could use 9 X 2572 points and reduce the mesh interval to the thickness of the disc. This would enable us to obtain some resolution of the vertical structure. Despite the fairly considerable re-organisation required the development seems worth-while, v) The mesh segmentation scheme [4] mentioned in section 3.3 is expected to be a major undertaking. The algorithm is a development of our present corrective charge procedure and depends on partitioning the mesh into rectangular submeshes. In each of these we solve the Poisson equation with a zero boundary condition. By
429
juxtaposing these submeshes we obtain a potential which satisfies Poisson’s equation at points not in the partitions. We calculate corrective charges on the partitions, determine the partition potentials and then correct the submesh potentials by the usual equivalent layer procedure. At no time do we need to hold more than a small fraction of the mesh in memory. As compared with Burridge and Temperton’s algorithm, the procedure is likely to require somewhat less arithmetic and to have much smaller file transfer demands. It is, however, cornplicated and its advantages for problems involving single galaxies may not justify the development cost. Its real attraction is for modelling small multiple systems of galaxies, such as the M8 1 /M82 group where the members only occupy a few per cent of the volume. It should be noted that its potential usefulness is not restricted to astrophysics. It is essentially a way of constructing a solution for the potential by combining solutions in subregions, and could be of value in situations with awkward geometry.
5. Particle-mesh simulation on the ICL DAP Our work on the DAP has not reached the same level of development as on the CRAY-iS. This is partly because the CRAY development started from a well-established code and did not involve major algorithmic changes. A subsidiary difficulty was that use of the QMC machine from Manchester became possible only at the beginning of 1981. It is not possible for me to offer performance comparisons with the CRAY-iS. Instead we will discuss the effect of the DAP architecture on the way we implement a simulation. In our simulations we scale positions and velocities to standard ranges in order to save arithmetic, Floating point operation has no advantages, and we use it on the CRAY only because the machine is designed to do it efficiently. On the DAP we use fixed point working throughout. It is faster and more economical on memory. At present we plan to use word lengths of 24 bits for positions and 18 bits for velocities when working on a 65~mesh, This will require assembly code subroutines for the arithmetic, picking up the lengths actually re-
430
R.A. James / Simulation of particle problems in astrophysics
quired from common areas. The packing and unpacking in the CRAY code will no longer be necessary. The link-list procedure used on the CDC 7600 and the CRAY-i S is quite inappropriate to the DAP. Instead we sort particles in parallel. At the beginning of a time step we hold them in array mode, ordered according to cell number.We abandon the block organisation used on serial and vector machines. For problems of reasonable size the particle information greatly exceeds the memory associated with the DAP. We hold it in a file which can be accessed through the Block Transfer System, so that transfers can be overlapped with computalion. We transfer sets of 4096 particles into long vectors in the DAP, using the host as auxiliary memory only, so that there is no need to re-format data before or after a transfer, Pushing the particles disorders the sequence, so they must be re-ordered. One attractive sorting technique is based on Batcher’s odd—even merge [16]. This process merges two ordered sequences by merging the even members and the odd members separately. On interleaving the resulting subsequences we find that any disorder can be removed by selective interchanges of adjacent pairs. The merging of the even and odd subsequences can be achieved by the same process. We perform a series of stages of division into even and odd parts which is reminiscent of the successive range halvings of the fast Fourier transform. The operation count for sequences of length N( ~ 4096) on the DAP is proportional to log2N. To order a
copy these ordered subsets to working areas and compress each of them into an ordered sequence. We obtain 7 ordered subsequences and a short disordered subsequence containing particles which cross 2 or 3 boundaries. We sort the last sequence completely and merge the 8 ordered subsequences in pairs by the odd—even process. Once the particles have been re-ordered we use a relative of the compression procedure to accumulate contributions to the mesh point masses, and distribute them to the correct places in the mass table by re-expansion. The same process of compression and re-expansion moves forces from mesh points to the particles requiring them. We then calculate particle accelerations in parallel for 4096 particles at a time, whatever the distribution of particles over cells. The bulk of the time needed on the DAP is used to move bits from one processor to the next, rather than for arithmetic. This is a much simpler operation than performing arithmetic with the bits, but it takes essentially the same time. There may be a possibility of accelerating this part of the operation by hardware means. For determining potentials we have an interesting alternative to the normal fast Fourier based procedures [17,18]. For the sake of simplicity we will discuss this in terms of a periodic convolution of the density with the Green’s function, as in the alternative to the corrective charge algorithm. The possibility of using a transform approach to convolution depends on the existence of an a such that the equation
completely disordered sequence we build up ordered subsequences of increasing length so the operation count is roughly to 2. With sequences of lengthproportional exceeding 4096 (log2 N) the operation count obviously increases. It is not necessary to do a complete ordering in the present application as particle pushing does not completely disorder the sequence. To change cell a particle must wander through one or more of the upper, lower, North, South, East or West boundaries. The subset of particles which remains in the cell is still internally ordered, as is the subset of particles wandering through any particular boundary without crossing any other. We may
a~’ 1
(1)
has Naredistinct roots. For complex numbers the roots a
=
exp(2irir/N)
(2)
and this leads to the Fourier transform. For cornposite N this property also implies that we can construct a fast transform procedure. We are not restricted to complex numbers on the DAP. As an example, consider the case of a transform of length N = 64. We perform arithmetic in the ring of integers modulo ~X= 232 + 1,
R.A. James / Simulation of particle problems in astrophysics
and consider a = 2”, 0 a64
= (2r)64 =
~
r
‘~
32. Clearly
a complete revision of the algorithms employed. This is sometimes considered to be a disadvantage, but the author cannot accept this view. The DAP
(232)2~
I )2r mod( Ot)
=
(—
=
I mod( ~)•
for r + 0 (3)
This gives 33 distinct roots of eq. (1). The inverses (in the ring) for 1 r 31 give the others. The twiddle factors in a fast transform procedure for this transform length can all be powers of 2. The multiplications in the transformation may be replaced by shifts and additions and are much faster than normal multiplications on the DAP. The convolution is accurate modulo ~t so our results are exact if the data and the Green’s function do not contain more than 16 bits of information. If we require more accuracy we can work modulo (264 + 1) and discard excess bits at the end. Working modulo (232 + I) we may expect to perform about 4 X 108 operations per second. The potential calculation should be extremely fast on the DAP. To compare the CRAY and the DAP over the entire calculation does not seem possible at present, but the DAP is likely to offer the performance we expect of a class 6 machine. ‘~
431
‘~
6. Conclusion Current serial computers lack the power to run realistic simulations in stellar dynamics. The new generation of vector and array processors enables us to build more realistic models. The resolution available will justify some confidence in our conclusions for disc galaxies and may do so for elliptical systems as well. With the aid of some mathematical developments we can hope to model systems which are completely beyond the range of current serial machines. Codes for the CRAY- 1 S can be developed from those for serial machines and offer a speed factor over the CDC 7600 in the range 7 to 10, The main problems of the Daresbury system are the limited memory and the slow file transfers. A machine with an I/O subsystem would be more efficient for our application. Effective exploitation of the ICL DAP demands
promises to be a most effective machine for this type of work, and its descendants may well prove to have overwhelming advantages. Acknowledgements I would like to thank Dr. J.A. Seliwood for communicating his results to me in advance of publication. Our work on the CRAY- 1 S and the DAP is supported by the Science and Engineering Research Council under grants numbered SGC /11108 and SGC/l2464. It is a pleasure to acknowledge their support. Fig. 2 is reproduced by courtesy of D. Reidel Publishing Co.
References [1] 0. Buneman, J. Comput. Phys. 1(1967) 517. [2] J.A. Sellwood, private communication (1981). [3] D. Lynden-Bell, Mon. Not. Roy. Astron. Soc. 187 (1979) 101. [4] R.A. James, in: Investigating the universe, ed. F.D. Kahn (Reidel, Dordrecht, 1981). [5] J.J. Binney, Mon. Not. Roy. Astron. Soc. 177 (1976) 19. [6] S.J. Aarseth and J.J. Brnney, Mon. Not. Roy. Astron. Soc. 185 (1978) 227. [7] M. Schwarzschild, Astrophys. J. 232 (1979) 236. [8] A. Wilkinson and R.A. James, Mon. Not. Roy. Astron. Soc. 199 (1982) 171. [9] R.H. Miller and B.F. Smith, Astrophys. J. 235 (1980) 421. [10] P.N. Appleton, R.D. Davies and R.J. Stephenson, Mon. Not. Roy. Astron. Soc. 195 (1981) 327. [11] [12] [13] [14] [15] [16]
[17] [18]
D.M. Burridge and C. Temperton, J. Comput. Phys. 30 (1979) 145. R.A. James, J. Comput. Phys. 25 (1977) 71. R.A. James and J.A. Sellwood, Mon. Not. Roy. Astron. Soc. 182 (1978) 331. J.A. Sellwood, Ph.D. Thesis, University of Manchester (1978). H. Ziegler, IEEE Trans. AU-20 (1972) 353. D.E. Knuth, The art of computer programming, vol. 3, Sorting and searching (Addison-Wesley, New York, Reading, 1973) p. 224. R.C. Agarwahl and C.S. Burrus, IEEE Trans. on Acoustics, speech and signal processing ASSP-22 (1974) 87. P.M. Flanders, D.J. Hunt, S.F. Reddaway and D. Parkinson, in: High speed computer and algorithm organization, eds. D.J. Kuck, D.H. Lawrie and A.H. Sameh (Academic Press, New York, London, 1977) p. 113.