Numerical multiple integration on parallel computers

Numerical multiple integration on parallel computers

Computer Physics Communications 26 (1982) 349—352 North-Holland Publishing Company 349 NUMERICAL MULTIPLE INTEGRATION ON PARALLEL COMPUTERS Alan C. ...

335KB Sizes 0 Downloads 100 Views

Computer Physics Communications 26 (1982) 349—352 North-Holland Publishing Company

349

NUMERICAL MULTIPLE INTEGRATION ON PARALLEL COMPUTERS Alan C. GENZ Mathematical Institute, University of Kent, Canterbury, Kent CT2 7N]~England

This paper discusses the implementation of methods for the approximate calculation of multiple integrals on a SIMD parallel computer. Adaptive methods using polynomial integrating basic rules and Monte Carlo and number theoretic basic rules are considered with particular reference to implementation on the ICL DAP computer. Test results are given which compare serial and parallel versions of the same method.

I. Infroduction The numerical calculation of multiple integrals has always been a problem area where there has been a demand for large amounts of computing time. The purpose of this paper is discuss the application of SIMD [11type parallel computers to this problem. First, we give a brief account of the structure of current algorithms. This leads to a discussion of basic multidimensional integration rules. Finally, we report on the use of some of the methods described with the ICL DAP computer, a SIMD type computer with 4096 parallel processors. We consider multiple integrals of the form 1(f)

b =j b~...f ‘f(x1,x2,...,x~)dx1 dx2...dx~, I

and assume that the integrand f(x) is computable for any x in the integration region R and that the purpose of an automatic integration algorithm is to compute I( f) to within a requested error er using as little computing resources as possible. Adaptive algorithms use strategies that attempt to concentrate the number of integrand evaluation points in those subregions of R where f( x) is most irregular. There are many different types of adaptive algorithms, but most types proceed in stages, with each successive stage using a new subdivision of R. We consider a “globally” adaptive algorithm, where there is at each stage a list of subregions. Associated with each subregion are its location OOlO-4655/82/0000—0000/$02.75

©

and dimensions, an estimated integral value and an estimated error. Such an algorithm proceeds by subdividing the subregion with largest error into at least two new subregions and applying to each one a given basic integration rule, which provides estimates for the integral and error in that subregion. The new subregions are then incorporated into the list and the new values for the subregion integrals and errors are used to update global estimates for the integral and error. The algorithm continues until either the global error estimate is less than er or the computing resources are exhausted. The resources used by adaptive algorithms fall into two categories: time resources and space resources. The space resources required depend mainly on the number of subregions considered during the final stage of the algorithm. Usually the easiest way to reduce the space requirement is to use a better basic rule so that the algorithm terminates at a stage requiring less subdivision, but a better basic rule will usually require more time per subregion (in a nonparallel algorithm) because such a rule uses more integrand values. A parallel algorithm is therefore unlikely to directly reduce the space required unless it allows the use of more accurate basic rules, which may also reduce the time. For many problems the time resources are the most important anyway, so we concentrate on these. The time used by adaptive algorithms may be divided into 7~,the total time used for decisions

1982 North-Holland

350

AC. Genz

/

Multiple integration on parallel computers

about the choice of the best subdivision and 7~, the total time used by the basic rule for integrand evaluations. With serial algorithms we usually have 7~>>7, but for parallel algorithms on a SIMD computer with M processors where we assume M integrand evaluations are carried out in parallel, the time per integrand evaluation will decrease as M increases, so the time proportions may change. However, as M will be fixed for any given machine there will always be integrands of sufficient difficulty so that 7,>> 7, and these will be the integrands that require the most total time. For such integrands we might expect a speedup factor of as much as ~ where r~,is the ratio of the time for a typical single number operation on the parallel machine to that time on the serial machine. The speedup when compared with vector machines could be similar, because these machines are most efficient when there are easily pipelined operations

nal algorithm is necessary and only the basic rule need be in parallel form. For case b), with M = KP, the algorithm must generate the necessary information about the chosen K subregions and organise this information so that the K applications of the basic rule can be carried out in parallel. This case will sometimes require significant revision of the original main algorithm and additional computation setting up subregion information, but when M is large this case will be preferred in order to allow the subroutine to be fully adaptive. Their are three main types of basic rule in current use for multidimensional integration: a) Monte Carlo, b) number theoretic and c) polynomial integrating (see refs. [2,3] for general references). If we assume that we have some specific subregion S with volume V. the Monte Carlo rules, in their simplest form, have all weights equal to V/P and the points x1 are uniformly distributed

to be performed in an inner loop. For adaptive integration the equivalent of the inner loop is the integrand evaluation, which usually involves a sequence of different and sometimes complicated arithmetic operations not particularly suitable for pipelining.

(pseudo)random points chosen from S. The Monte Carlo rules are probably the easiest to use on a parallel machine because the number P can be chosen freely as either a divisor or multiplier of M. The random points are also easily generated in parallel. Unfortunately, the Monte Carlo methods, although usually robust, are generally the least efficient. They sometimes provide low accuracy results quickly, but the convergence to higher accuracy is poor. There are several types of number theoretic rules [2,3]. Usually, these rules are given in a form similar to the Monte Carlo rules, except that the number P must be prime and the x~are chosen with number theoretic properties that make the rules optimal for certain classes of functions. These rules are usually more efficient than the Monte Carlo rules, but they are slightly more difficult to implement in a completely efficient parallel form because divisors and multiples of M are not likely to be prime. The polynomial integrating rules have weights and points chosen so that they are exact for as high degree a polynomial as is possible. The points, weights and P are all highly depedent on n and the rule degree. This type of rule will usually be the most difficult to implement in an efficient parallel form, but for many types of integrand these rules are considerably better than rules of other types.

2. Basic rules Basic rules for multiple integrals take the form 1 f~= B~ /

,

=

~

J

The weights w~and the integrand evaluation points x~determine the type of basic rule, and each application to the integrand in a given subregion requires an appropriate transformation of the weights and points, The most efficient use of the parallel computer will require that all of the M processors be used simultaneously as much as possible, so we would ideally want basic rules that have P as a) some multiple or b) some divisor of M. For case a), with P = kM say, the algorithm selects a subregion, the integrand evaluations are carried out in k parallel phases, the algorithm selects the next subregion, and so on. For this case little change in the origi-

A. C. Genz

/

Multiple integration on parallel computers

The superior rule efficiency should compensate for loss of parallelism with most problems.

351

In this section we discuss practical experience using modifications of one general purpose adaptive multidimensional integration subroutine. This subroutine is a globally adaptive routine similar to the routine ADAPT [4]. The basic rule is a polynomial integrating rule of degree seven which has a rule of degree five imbedded in it for error estimation. The adapive strategy involves halving at each stage the subregion with the largest error along the axis where the absolute fourth difference in the integrand is largest. The number of integrand evaluations ~ for the rule is given below,

We used approximately of 50000 integrand calls per individual test with ten samples of the integrand for each n in the range 3—7. For each sample, parameters r, c, were randomly chosen from [0,1] and then the c, were scaled so that n~c,= 100. The overall test consisted of a total of 50 calls of the subroutine and approximately 2.5 million integrand evaluations. We compare with results of the same tests carried out using ADAPT on a CDC 7600 (optimised FTN compiler). The average accuracies of the returned results were comparable so we give total test times only. The parallel results were obtained on the ICL 2980/DAP system at Queen Mary College, University of London. There are several sets of tests. The first test (labeled f) used the integrand as given; further tests (labeled Xm) were the same except the integrand evaluation times were increased artificially by using m integrand evalua-

n

tions for each evaluation point.

3. Parallel algorithms on the DAP

2

3

4

5

6

7

8

9

10

L~”~17

33

57

93

149

241

401

693

1245

For parallel work on the DAP we have used a modified version of the original algorithm. First to be the largest integer so that we define Kt~~ ~ ~ 4096. At stage one the original integration region is divided by the host routine into approximately 2K~’~ subregions (instead of one for the original algorithm), and the relevant subregion information passed to the DAP which applies the basic rule to all subregions simultaneously and returns the integrals, errors and subdivision axis informaton to the host routine. This information is placed in an error keyed partially ordered list, From then on the same steps are repeated: i) the n) subregions with largest errors are halved along the appropriate axes, ii) the DAP is called to apply the basic rule to the resulting 2K~”~ subregions and iii) new results are returned to the host, accumulated and ordered. An integrand is defined as a DAP (64 X 64) matrix function that takes for its evaluation point argument an n-dimensional array of 64 X 64 matrices. We give below some test results using the test integrand n

f(x)

cos 2irr + ~ ~ 1=1

\

/

Total test times(s) __________________________ Machine f X 10 X 100 X 1000 CDC7600 59 251 2165 21329 1CL2980 + DAP 42 + 113 125 238 1367 ___________________________________________ The DAP times indicate that even for the fairly simple integrands used in the tests we have 7> T,, where the DAP time includes the time to set up the integrand evaluation point array. Comparison between the different DAP test results shows that the first test times were dominated by setup time, but later tests, involving integrands which needed more time to compute, were not. The differences between the DAP and CDC times show that even for fairly simple integrands the parallelism of the DAP compensates for its relatively slow arithmetic, and for more difficult integrands the speedup factor can be greater than ten.

4. Conclusion In this brief paper we have given a rather general discussion of multiple numerical integration on a SIMD type parallel computer, along with

352

A. C. Genz

/

Multiple integration on parallel computers

a report of some practical results obtained using the ICL DAP. The results given show that parallel algorithms can give substantial improvements in times required, particularly with integrands that are time consuming to compute. There is clearly a need for more detailed investigations of a) the effect of the type of integrand on the CPU time for the specific algorithm discussed here, and b) other algorithms with other types of basic rules.

References [1] M.J. Flynn, Proc. IEEE 54 (1966) 1901. [2] A.H. Stroud, Approximate calculation of multiple integrals Cliffs, New Jersey, 1971). [3] (Prentice-Hall, P. Davis andEnglewood P. Rabinowitz, Numerical integration (Academic Press, New York, 1975). [4] A. Genz and A.A. Malik, J. Comput. Appl. Math. 6 (1980) 295.