0097~8485/93 s66.00+ 0.00 Copyright 6 1993 Pergmon Press Lrd
Compukrs Chem. Vol. 17, No. 3, pp. 323-325, 1993 Printed in Greal Britain. Allrights mcrved
APPLICATION
NOTE
PERFORMANCE OF IBM RISC 6000 WORKSTATIONS ELECTRON CORRELATION CALCULATIONS
IN
DAVID MONCRIEFF’* and STEPHENWILSON**~ Research Institute, Florida State University, Tallahassee, FL 32306-4052, U.S.A. and ZRutherford Appleton Lsbcratory, Chilton, Oxfordshire OXI 1 OQX, England
‘Supercomputer
Computations
(Received
IO Nouember
1992; in revised form
If May 1993)
Abstrw%-The performance of the IBM RISC correlation energy calculations using many-body
characteristics
6000 (RS6000) family of workstations in electron perturbation theory is measured. The performance recently obtained for the CRAY Y-MP C-W computer.
are compared with data ratios are also compared.
Price-performance
The ab initio determination
of the electronic
of molecules demands considerable resources
structure
computational
when account is taken of electron effects (Wilson, 1984). The most demanding of the components of the correlation energy in the widely used perturbation theory approach to the electron correlation problem is that associated with fourth-order diagrams involving triply excited intermediate states, which leads to an algorithm scaling as the seventh power of the number of basis functions employed (Wilson t Moncrieff, 1991; Wilson % Saunders, 1980; Wilson & Guest, 1980; Baker et al., 1991). We have recently investigated the performance characteristics of a number of supercomputers when used to evaluate this component of the correlation energy including the CRAY Y-MP C-90 machine for which rates of execution in excess of 13.3 GFLOPS have been observed on a dedicated 16 processor system (Moncrieff et aZ., 1992a). Recent years have seen the development of increasingly powerful workstations, such as the IBM RS6000 and HP700 series. It is the purpose of this note to compare the performance of the IBM R%000/320H and RS6000/550 workstations in a many-body perturbation theory treatment of electron correlation effects with that achieved on a state-ofthe-art supercomputer, the CRAY Y-MP C-90. The characteristics of the workstations use in the present study are collected in Table 1. The RS6000 machines can execute one floating point instruction per cycle but this may be a compound instruction, such as a floating point add/multiply, which we count as two floating point operations. p&peak), the theoretical peak performances, given in Table 1 are calculated by assuming that two floating point operations are performed in each cycle. The data presented in Table 1 should be compared with the corresponding data for especially
correlation
*Author for correspondence.
the CRAY Y-MP C-90 which has a cycle time of 4.2 ns with a maximum of four floating point operations per clock period giving a p,,(peak) of - 1000 MFLOPS on each processor and a p&peak), theoretical peak performance per wall clock second, of - 16 GFLOPS, for a 16 processor CRAY Y-MP C-90. We can define the price-performance ratio, P,,,, as c : p, where c is the cost (in U.S.$), p is the rate of computation in MFLOPS, and m denotes the machine. At present, an IBM RS6000/320H with 32 Mbytes of memory is priced at U.S.$15,034 (IBM, 1992) whilst the IBM RS6000/550 with 64 Mbytes of memory costs U.S.$52,500 (IBM, 1992). The price-(theoretical) peak performance ratio, P,,,_ for the @eaW = c : pcpur is thus 301 U:S.%/MFLOPS 320H and 629 U.S.$/MFLOPS for the 550. The 16 processor CRAY Y-MP C-90 is currently priced at - U.S.$30,000,000 (the exact price depending on the details of the configuration) giving a P&peak) = c :pwc x 1875 U.S.$/MFLOPS. Note that in calculating P we have used pcpu for the uniprocessor IBM RS6OOOs and pwE for the multiprocessor CRAY Y-MP C-90. On the basis of these price(theoretical) peak performance ratios we have
: P,,,,(peak)
- 6.2
Pcw(peak): P,,,(peak)
z 3.0.
P&peak) and
We have employed a modified version of the recently published (Moncrieff er al., 1992b) ccMBPT4, code which was written for the CRAY X-MP, Y-MP and C-90 machines and expIoits the macrotasking capabilities of these machines. The code has also heen adapted for the IBM 3090/6OOJ VF computer (Moncrieff er al., 1991) and the Intel i860 GAMMA and DELTA machines (Rendell et al., 1993). Very recently, an implementation on the NEC SX-3/44 computer has been described (Moncrieff et al., 1992c). For the IBM RS6000 implementation
323
Application Note
324 Table I. Characteristics
of the IBM RS60 present study
wrkstations
320H CPU clock speed Cycle time &Jpeak) Memory Virtual memory Ooeratinn svstem
used in the 550
25 MHz 40.0 ns 50 MFLOPS 32 Mbytes I28 Mbytes AIX 3. I
41.7 MHz 24.0 ns 83.4 MFLOPS 64 Mbytes 160 Mbytes AIX 3.1
we made the following modifications code:
P
u 320H + 550 * c-90
to the published
0) removed
the macrotasking directives, which are specific to CRAY machines; (ii) replaced the calls to subroutine MXMA, the CRAY routine for matrix multiplication, by corresponding calls to DGEMM; (iii) converted from 32 bit word length to the 64 bit word length required to achieve the accuracy demanded in electron correlation energy studies. As in our previous experiments (Wilson & Moncrieff, 1991; Moncrieff et al., 1992a, b, C; Rendell et al., 1993), the number of occupied orbitals, N,, was fixed at 4 and the behavior of the code was monitored as the number of virtual orbitals, N,,, was increased. Specifically, the values Nti = 16, 32, 48, 64, 96, 112, 128 were considered. The measured processing times, zcpU.for both of the workstations used are presented in Table 2 as a processing times function of N,,,. The measured previously obtained for the CRAY Y-MP C-90 are given in the fourth column. The measured times, +, are plotted as a function of Nvirt in Fig. 1. The values of r,pu increase markedly for basis sets containing more than m 100 functions and for N,, = 128 the calculation requires _ 3f and _ 14 h on the 320H and 550 workstations, respectively. It should be remembered that the determination of electron correlation effects is only part, albeit a major part, of an electronic structure calculation and that the complete Table 2. Performance observed on the IBM RS6000/32OH and 550 as a function of Nvi,, the size of the virtual orbital set, and comaarison with the CRAY Y-MP C90 N..”
320H
%u
550
c-90
16
8.12
3.64
0.46
32 48 64
68.08 261.78 719.19
33.47 127.99 346.86
2.64 8.89 22.37
96 I12
2934.63 5348.13
1467.39 2560.79
93.11 161.31
128
11923.86
5150.91
259.93
%
14.42
32.17
254.60
32 48
18.49 20.52
37.62 41.96
476.89 604.12
64 96 112 128
21.41 23.87 23.48 17.52
44.39 47.73 49.04 40.56
688.34 752.23 718.55 803.75
rcpy is the
measured central processing unit time and pep, is the rate of execution in millions of floating-point operations per central processing unit second.
0
20
40 Number
60 of virtual
80
100
120
140
functions
Fig. 1.Measured processing times, TV,,, for the IBM RS6000 320H and 550 workstations and the CRAY Y-MP C-90 as a function of N,,i,,, the number of virtual functions.
computation may have to be carried out repeatedly if the molecular geometry is being optimized and/or if the convergence of the results with increasing quality of basis sets is investigated. Geometry optimization and/or basis set convergence studies may involve repeating the prototype calculations performed in this work upwards of -20 times and lead to total computation times measured in days, weeks, or, when increasing numbers of electrons are considered, even months. The CRAY Y-MP C-90 allows a calculations for which N,, = 4, N,, = 256 to be completed in ~230s. Without a hardware monitor there is no automatic procedure for counting the number floating point operations a particular code performs on the IBM RS6000 workstations. However, from our previous work (Moncrieff et al., 1992a), we have values of r,rU and the rate of execution, pEPY,on the CRAY Y-MP C-90 and their product is the total number of floating point operations, N. The rate of execution on the workstations is then the ratio of N to tcp, for the appropriate machine. Values of pcpUcalculated in this fashion are presented in the second and third columns of Table 2. Corresponding values of pcpu for the C-90 are given in the fourth column; some values of pWcfor this machine have been given previously (Moncrieff et al., 1992a). The rates of execution, p. are plotted on a logarithmic scale in Fig. 2 as a function of NYirl. pFlu is shown for uniprocessors and p,_ is displayed for multiprocessors. The multiprocessors considered are CRAY Y-MP C-90 with 4, 8 and 16 dedicated processors, which are designated C-90(n), n = 4,8,16 in Fig. 2. The IBM RS6000 workstations are virtual memory machines. The rate of computation achieved can be seen from Fig. 2 to degrade somewhat for basis sets containing more than - 100 functions. This is attributable to the paging which takes place when the real memory will not accommodate both the two-
Application
0
.
2 2
1000
q
100
+ x * 4
320H c-90 550 C-90 c-90 C-90
(4) (8) cpu (16)
L ~+_+_+-+--+-+.+
¤_.-a-~-~.
-
10
.’
0
50
Number
.
I
I
100
150
of virtual
I 200
I 250
I 300
functions
Fig. 2. Rate of execution, p, as a function of N,,, the number of virtual functions. pcpu is given for uniprocessors and pwt for multiprocessors. C-90(n) denotes an implementation on n dedicated processors of the CRAY Y-MP C-90.
electron integrals and intermediates which arise in the computation. It is useful to calculate the price:actual pcrformante ratio, P,(actual). For Nvirc= 128, we obtain P,,(actual) = 858 U.S.$/MFLOPS and P&actual) = 1294 U.S.$/MFLOPS. When the performance actually achieved is considered, the price-performance ratio is increased by -200 and - 100% over that for peak performance for the 320H and 550, respectively:-P&actual) : P,,,,(peak) z 2.9 and P,,,(actual): P&peak) z 2.1. In comparison, for a dedicated 16 processor CRAY Y-MP C-90, pW = 13338.54 MFLOPS have been recorded for NVi, = 256 giving P&actual) z 2249 U.S.$/MFLOP, which represents an increase of only -20% over the ratio for the peak performance:P&actual): P&peak) z 1.2. Comparing the price: actual performance ratios for the IBM RS6000 workstations with those for the CRAY Y-MP C-90 we find P&actual): P,,,,(actual) 2: 2.6, that is 42% of the corresponding ratio for peak performance, and P&actual): P&actual) - 1.7, 57% of the price: peak performance ratio.
Note
325
For many-body perturbation theory calculations employing < - 100 basis functions the IBM RS6000 workstations provide a viable alternative to a stateof-the-art supercomputers. For larger basis sets one can envisage the computation becoming quite lengthy particularly if geometry optimization and/or basis set development is involved. The price: actual performance ratios for the workstations and the CRAY Y-MP C-90 differ by a factor of two in favor of the former. Hence, provided that both the workstations and the supercomputer are efficiently managed, the workstations are more cost-effective. However, should significant idle time accumulate on a workstation (e.g. if it were switched off at night) the situation could easily be reversed. Acknowlednements-DM acknowledges the suooort of the U.S. Department of Energy th;ough co&act No. DE-FCOS-85ER2500000). SW acknowledges the SUDDO~~ of the Supercomputer Computations Research In&ute of Florida State University making possible a visit during which most of the results reported in this note were obtained.
REFERENCES Baker D. J., Moncrieff D., Saunders V. R. & Wilson S. (1991) Comput. Phys. Commun. 62, 25. IBM RISC System/6000 Quick Reference Guide, U.S.A. Commercial List Prices, as of 24 April 1992. Moncrieff D., Saunders V. R. & Wilson S. (1991) Rutherford Appleton Laboratory Report. RAL 91064, Oxford, England. Moncrieff D., Saunders V. R. & Wilson S. (1992a) Supercompuier 49, 4. Moncrieff D., Saunders V. R. &Wilson S. (1992b) Compur. Phys. Commun. 70, 345. Moncrieff D., Saunders V. R. & Wilson S. (1992c) Rutherford Appleton Laboratory Report, RAL 92034, Oxford, England. Rendell A. P., Lee T. J., Komomicki A. Br Wilson S. (1993) Theorer. Chim. Acta. In press. Wilson S. & Guest M. F. (1980) C/tern. Phys. Serf. 73, 607. Wilson S. & Saunders V. R. (I 980) Cornput. Phys. Commun. 19, 293. Wilson S. (1984) Electron Correlation in Molecules. Clarendon Press, Oxford. Wiison S. & Moncrieff D. (1991) Supercomputer 45, 28.