Mathematics and Computers in Simulation 44 (1998) 545±558
A distributed memory parallel algorithm for the efficient computation of sensitivities of differential-algebraic systems 1
B.R. Keeping, C.C. Pantelides ,* Centre for Process Systems Engineering, Imperial College of Science, Technology and Medicine, London SW7 2BY, UK Received 4 September 1996; revised 7 October 1997; accepted 16 October 1997
Abstract An efficient algorithm for the evaluation of the parametric sensitivities for mixed systems of differential and algebraic equations (DAEs) on computers involving multiple processors operating in parallel is presented. The algorithm derives its efficiency by decoupling the integration of the sensitivity equations from that of the original DAE system, and by allowing tasks associated with the evaluation of sensitivities at multiple time points to overlap instead of being carried out in sequence. Numerical experiments demonstrating the efficiency of the proposed algorithm with systems of more than 850 DAEs and 45 parameters are presented. In all the cases studied, the simultaneous integration of the original DAEs and their sensitivity equations is carried out in less than 10% more time than that of the original DAEs alone. # 1998 IMACS/Elsevier Science B.V. Keywords: Differential-algebraic equations; Sensitivities; Parallel computation
1. Introduction Complex dynamic models in physics and engineering generally require the values of numerous input parameters. Sensitivity coefficients, which are defined as the partial derivatives of the final outputs of the model with respect to these parameters, can play a useful role in their own right in the analysis of such models [12]. Moreover, when the determination of optimal values for these input parameters is formulated as a constrained optimisation problem, the sensitivities play an essential role in the calculation of the gradients of both the objective function and the constraints with respect to these parameters. Procedures for computing parametric sensitivities for models expressed as systems of differentialalgebraic equations (DAEs) are well established. However, the cost of this computation increases ÐÐÐÐ * Corresponding author. 1 Author to whom correspondence should be addressed. 0378-4754/98/$19.00 # 1998 IMACS/Elsevier Science B.V. All rights reserved PII S 0 3 7 8 - 4 7 5 4 ( 9 7 ) 0 0 0 9 9 - 2
546
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
both with the size of the underlying system and with the number of input parameters, to the extent that it becomes the dominant consumer of computational effort with even moderately large problems. In this work, we exploit the power of distributed memory parallel computing by seeking to decouple the computational tasks involved in sensitivity analysis from the underlying operation of solving the original DAE system. Our ultimate goal will be to achieve both tasks simultaneously in approximately the same amount of time as would be required for the integration of the original system alone on a single processor. The rest of this paper is organised as follows. The next section provides a brief review of the necessary mathematical background on the computation of sensitivities of DAE systems. Section 3 describes the proposed parallelisation strategy together with a theoretical analysis of its expected performance. Finally, Section 4 presents computational results obtained by applying the parallelised algorithm to three examples of varying size. 2. Mathematical background In this section, we briefly review the background to the method of sensitivity calculation on which our work is based, highlighting the decisions made in arriving at our particular implementation. 2.1. Sensitivity equations for DAE systems Consider the DAE system f
x; x_ ; ; t 0 8t 2 0; tF
1
where x(t) is a vector of N state variables, and is a set of Np time-invariant parameters. Generally, the matrix fx_ will be singular. We further assume that (1) is an index one DAE system [1] i.e. that (1) together with the equations derived by differentiating (1) once with respect to time define unique values for x_ for any given set of values for x and . The initial condition of the above system is described in terms of a set of equations of the form C
x
0; x_
0; 0
2
We assume that, once is specified, Eq. (2) together with (1) at t0 determine a unique set of initial values x(0) and x_
0, and that the DAEs (1) then define a unique trajectory for x(t) for t 2 0; tF . Consideration of the total derivative of (1) with respect to yields the following auxiliary DAE system (the `sensitivity equations' of the original system), which will be used to determine the final time sensitivities x j
tF : @f @f @f x x_ 0 @x @ x_ @
3
with initial conditions @C @C @C x
0 x_
0 0 @x @ x_ @
4
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
547
2.2. Simultaneous numerical solution of DAEs and sensitivity equations A well-established class of methods for the solution of both the original system (1) and the sensitivity Eq. (3) is based on backward difference formulae (BDF) proposed by Gear [7]. The use of the BDF approximation in the former case leads to a set of nonlinear algebraic equations at each time step n which is then solved by a modified Newton iteration to convergence. In the latter case, the relevant BDF formula is
_x n n
x n
n X j1
j
x nÿj
5
where n and j ; j 1; . . . ; k depend on the current integration stepsize and order k. Its use leads to a system which, as noted by Caracotsios and Stewart [2], is linear with respect to the variables x and involves a matrix identical to the Jacobian used for the integration of the original system Jn
@f @f n @x @ x_
6
where all partial derivatives are evaluated at
xn ; x_ n ; ; tn . The values of the sensitivities x at step n can be obtained by direct solution of the linear system: ! k @f @f X
7 j
x nÿj Jn
x n ÿ @ @ x_ j1 2.3. Error control for the sensitivity calculations It has been pointed out [6] that, provided the exact Jacobian Jn is used in (7), the sensitivity values computed at each time step by the procedure outlined above are the exact sensitivities of the solution of the time-discretised system. Numerical experiments also confirm that there is little to gain by attempting to apply a local error test to the sensitivity values. Failure of such a test may be regarded as indicating simply that the discretised system does not behave in a particularly smooth manner at the point in question. In fact, if this test is used, sensitivity integrations at a given tolerance level will sometimes take far longer than they would have without it. In such cases, greater overall accuracy in the gradient evaluation with less cost is probably achievable by applying more stringent tolerances to the underlying integration, with no separate test for the sensitivities. The approach we adopt, therefore, is to rely on accurate solution of the DAE system, and also on the use of a newly computed Jacobian at every step for the sensitivity calculation in order to control the error in the sensitivities. 3. Parallelisation strategies for sensitivity calculation We now turn to consider appropriate mechanisms for the parallelisation of the sensitivity calculations.
548
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
3.1. Analysis of the major computational costs Computational experience using serial implementations of the sensitivity procedure makes it abundantly clear that integrations with sensitivity calculation are considerably more expensive than those without. This is more and more evident as the number of parameters involved rises. The principal reasons for this behaviour are the following: ± For a system involving Np parameters , the determination of the sensitivities requires the solution of Np linear systems of the form (7) at each step. However, it should be noted that substantial savings are possible as all of these involve the same matrix Jn on the left-hand side. ± For the correct determination of the sensitivities, the Jacobian matrix Jn used in (7) must be an accurate representation of the matrix (6) at the current values of the variables xn. Therefore, in practice, this matrix has to be computed (analytically or using finite difference perturbations) and factorised into its lower and upper triangular factors at each step. This is in contrast to the matrix approximation Jn used in the integration of the original DAE system, the quality of which affect only the rate of convergence of the Newton-type iterations but not the solution xn obtained eventually. Indeed, modern DAE codes derive much of their efficiency by computing and factorising the Jacobian matrix as infrequently as possible. 3.2. Overview of the parallelisation strategy Our parallelisation strategy (see Fig. 1) is to separate the additional computation required for the sensitivity calculations from the basic DAE integration, utilising multiple processors for the two major activities involved, namely Jacobian calculation and factorisation (once per step) and backsubstitution (once per parameter per step). A `manager' process is used to ensure that the communication activities of the other processes are kept as simple as possible. Accordingly, we distinguish four types of processes as follows: (i) The Integrator (DAE±INT) This carries out the solution of the DAE system (1) subject to the initial conditions (2) for given values of the parameters . A standard DAE integration algorithm based on the BDF class of methods is used for this purpose. In particular, no modification is made to the frequency of evaluating and factorising the Jacobian matrix approximation Jn. The Integrator communicates with the Sensitivities Manager (SM) passing to it the values of the variables xn at the end of each successful integration step. (ii) The Sensitivities Manager (SM) This process manages the communication between the Integrator process and the processes that carry out the sensitivity calculations. Upon receiving a new set of values xn from the former, it sends it to the next available Jacobian Handler (JH). Some communication with the Sensitivity Solvers (SS) (see below) is also necessary; for example, when a change of integration stepsize or order takes place in the Integrator, they need to be notified so that they can take this into account in the evaluation of the right-hand side of (7). (iii) The Jacobian Handler(s) (JH) Each of these processes has responsibility for evaluating and computing the LU factorisation of the Jacobian matrix (6) at a point xn passed to it by the Sensitivities Manager. Once this is completed, the original Jacobian matrix and its factorisation are sent to all Sensitivity Solvers.
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
549
Fig. 1. Parallelised dynamic optimisation algorithm.
(iv) The Sensitivity Solvers (SS) Each of these processes has ultimate responsibility for computing and maintaining the time variation of the sensitivity of the system variables x with respect to a subset of the vector of parameters throughout the entire computation. Thus, the maximum possible number of such processes is equal to the number of parameters Np. However, more than one parameter may be allocated to each such process. Upon receiving the Jacobian Jn and its factorisation, the Sensitivity Solver evaluates the right-hand side of (7) for each parameter i which it has been allocated, and then proceeds to carry out a set of backsubstitution operations to evaluate the current sensitivity values
xi n . The values of the sensitivities
xi nÿj for j 0::kmax where kmax is the maximum order allowed by the BDF integrator, are retained locally in a history array to allow the evaluation of the summation on the right-hand side of (7) at subsequent steps. Maintaining an up-to-date version of the history information is another important task carried out by the Sensitivity Solver.
550
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
The above description covers the normal operation of the parallel algorithm. There are also some other occasional interactions between the processes. For instance, it is often convenient for external programs to receive the values of both xn and (x n via a single call to the DAE±INT process. In order to allow the DAE±INT to report values of the sensitivities to any external program calling it, we have to arrange for this information occasionally to be passed from the SSs to the SM and from there to the DAE±INT. This communication is shown in dashed lines in Fig. 1. 3.3. Treatment of linear algebra It is important to note that the major computational effort of the entire sensitivity calculation goes into the linear algebra operations. Specifically, the use of (7) implies the evaluation of the exact Jacobian matrix, as well as the solution of Np systems involving it at every step. We therefore review the approach to linear algebra employed in our underlying serial algorithm, before presenting the adaptation of this approach to the parallel context. 3.3.1. Background For typical problems, J is a large, sparse but unstructured matrix. Thus, we choose direct sparse linear algebra algorithms such as MA28 or MA48 [4,5], which reduce J to lower and upper triangular factors L and U, employing pivotting strategies such as the Markowitz criterion [10] to minimise fill-in (i.e., the generation of new non-zero elements) while ensuring numerical stability. Many applications involve a series of matrices which are all of the same structure and where the numerical values are likely to be broadly similar from one factorisation to the next. This certainly is the case with Jn in our algorithm. Because the process of selecting among all possible pivots at each stage of the factorisation is relatively expensive, repeat factorisations simply employ the sequence of pivots which succeeded on the previous occasion. Thus, considering the MA28 routines used in our original serial code, the work is divided between three routines as follows: (i) An `A' routine to perform the initial factorisation with pivot selection. This routine is also used whenever the `B' routine fails during a repeat factorisation. (ii) A `B' routine to repeat the factorisation with a new Jacobian retaining the previous pivot sequence. (iii) A`C' routine to make use of the factorisation resulting either from the `A' or `B' routine to solve a particular linear system. For the purposes of our parallel implementation, the communication requirements resulting from the use of these three routines is an essential issue. If the original matrices Jn have dimensions N N, the `B' routine makes use of nz0 5N integer values generated by the `A' routine, where nz0 is the number of non-zeros in the factorised matrix. Usually nz0 nz due to fill-in. The same information, together with the nz0 real values of the factorised matrix itself, is required by the `C' routine in order to solve a linear system involving Jn. In practice, there are occasions when the use of the pivot sequence generated by the `A' routine leads to failure in the `B' routine. When this occurs, it is necessary to call the `A' routine again with the same matrix, so that a new pivot sequence will be chosen.
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
551
3.3.2. Parallel algorithm implications Recall from Section 3.2 that the factorisation/refactorisation of the Jacobian at each time step is the task of the JHs, while backsubstitution is performed by the SSs. Thus, when the MA28 routines are used, there is a need for information to be broadcast from the former to the latter. For synchronisation reasons, we need to ensure that the various JHs that may happen to overlap in time, complete their operation in the correct order with respect to the steps that they are handling. This will not necessarily happen automatically due to the differences in processor speeds and also the pivot sequences employed by each JH. We therefore require that a JH handling the factorisation of the Jacobian at step n waits for the successful completion of the factorisation of the Jacobian at the previous step nÿ1 before transmitting its result to the SSs. A more serious complication arises with the parallelisation of the repeat factorisation procedure. This may occur when two different JHs are carrying out repeat factorisations on the Jacobian matrices for two consecutive steps, nÿ1 and n respectively, at the same time. In such a case, both handlers will be using a pivot sequence that was determined by an `A' factorisation at an earlier step n0 < n ÿ 1. Now, for various numerical reasons, it is possible that the factorisation at step nÿ1 fails while that at step n is successful. In such a case, the former factorisation will be repeated using the `A' routine to determine a new pivot sequence, which will then be transmitted to the Sensitivity Handlers. This may create a synchronisation problem since the factorisation for step n now corresponds to a pivot sequence that is no longer available to the SSs, having been overwritten by the new one just determined by the JH at step nÿ1. To avoid such difficulties, we need to ensure that in such cases the JH for time step n repeats its factorisation using the new pivot sequence, even if it was successful with the old one. The basic algorithm for the JH routine, based on these points, is presented in Fig. 2. Some further points to note regarding this algorithm are listed below: (i) Each JH is characterised by its own unique non-zero identity number. (ii) At step 2, we check to see if the previous factorisation has in fact finished. If this is the case, we ensure that the pivot sequence we are about to use matches that of the immediately preceding factorisation. (iii) Propagation of a new pivot sequence around the loop of JHs is achieved by the use of the NewPivotID variable. This is set to the identity number of the JH which generated the sequence (at step 6.2) and is circulated to all other processors in the loop, until the test at step 7 prevents the original JH receiving back a sequence which it generated itself. 3.4. Load balancing analysis An important implementational decision concerns the numbers of JH and SS processes to be employed during the computation. Although the parallel algorithm can operate with a single JH, having more than one such process may well allow a higher degree of parallelisation. In particular, it is possible to have the SSs carry out the evaluation of the sensitivities at a step n, while the m JHs are evaluating and factorising the Jacobian matrix at steps n 1 . . . n m. However, for this to be possible, the DAE integration carried out by the DAE±INT process must be sufficiently fast to have already generated the points xnj ; j 1 . . . m. Let the average computational time per integration step be denoted by I, that for Jacobian evaluation and factorisation by J, and the time for forming the right-hand side of (7) and solving this linear
552
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
Fig. 2. Algorithm for the JH routine.
system for a single parameter by S. For most systems of practical interest, we will have: J > I > S
8
since a typical integration step involves only two to four Newton-type iterations, each involving a function evaluation and a backsubstitution step using a previously factorised Jacobian approximation Jn. Under these conditions, the optimal number of JHs is approximately given by: J
9 m I Furthermore, up to I s S
10
parameters may be allocated to each SS processor. The total number of processors used would thus be 2 m dNp =s e assuming that enough processors are available. To understand these expressions, consider an example involving 5 parameters and in which the times J ; I and S are in the ratios 6:2:1. Then (9) and (10), respectively, predict that we should have
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
553
Fig. 3. Gantt chart for processor utilisation.
m 3 independent Jacobian handler processes, and that we should allocate s 2 parameters to each SS processor. We would therefore need 3 processors for the SSs, leading to a total of 8 processors. Ignoring the overheads of communication, the processor utilisation during the first few steps of the computation will be as shown in the Gantt chart of Fig. 32. Here, the number above each horizontal segment against the DAE±INT processor and each of the 3 JH processors denotes the integration step n to which the computations currently being carried out pertain. The numbers of the form n.i against the 3 SS processors denote the integration step n and the parameter i currently being treated: (i.e. that they are currently solving Eq. (7) for
xi n ). It can be seen that the choices (9) and (10) ensure that all processors remain as occupied as possible. Furthermore, Fig. 3 indicates that the overall computation will, in fact, be completed within the time necessary to complete the integration of the original DAE system (1) plus J s S . Since there will generally be a large number of integration steps involved, this theoretical minimum for the solution time is acceptably close to the time for integration only. Of course, in reality, communication costs cannot be ignored and the optimal values of m and s may have to be determined experimentally. However, the above analysis is useful in demonstrating that these values may both be greater than unity. 3.5. Implementation We have implemented the above parallel algorithm on a Fujitsu AP1000 with 128 `cells', each consisting of a SPARC II processor with 16 Mb of on-board memory. This is a distributed memory 2
The processor handling the SM is not shown on this diagram.
554
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
parallel computer with relatively efficient interprocess communication ± at worst, the time for sending x bytes between cells is (0.16x135) ms. In the interests of portability, we used the Message Passing Interface (MPI) standard for inter-process communication [14]. Although MPI allows multi-tasking, we will assign only one process to each cell. The MPI construct of `communicators' and their relatively efficient implementation on the AP1000 machine proved important in the parallelisation of our serial code. By establishing a separate communicator for each JH consisting of that process together with all of the SSs, it was possible to handle this communication entirely through collective operations. 4. Computational results To examine the practical performance of our parallel sensitivity technique, we will apply it to three DAE systems of differing sizes. All three problems have their origin in the optimal control of chemical engineering systems operating under transient conditions. The number of parameters can be varied by considering different representations of the control variables [15]. The results are presented as tables of timings and processor usage. The first row of each table shows timings for the serial implementation of our sensitivity evaluation code [8] running on a single cell of the AP1000 machine. The first figure in the `time' column is the CPU time for a single integration when sensitivities are computed, while the bracketed figure is the CPU time needed for the same integration without sensitivity calculation. The difference between these two figures indicates the (invariably significant) computational overhead associated with the evaluation of sensitivities on a serial machine. Moreover, the bracketed figure represents on optimal target for our parallel implementation approach. The remaining rows in each table show timings of the sensitivity integration for the parallel implementation, and are intended to demonstrate the effect of choosing different numbers of processors to be dedicated to JHs and SSs. The four columns on the right show the average reported CPU statistics for each type of processor employed. The first figure in each entry in these columns shows the total CPU usage as a percentage of the wall-clock time for the run, including both computation and communication, while the bracketed figure next to it represents the `user task' CPU time. The difference between the two indicates the overhead caused by message passing, due to memory copying, buffering etc. 4.1. Problem 1: Semi-batch reactor This problem concerns production of a fine chemical from two feedstocks in a semi-batch reactor. The system model is described in terms of 70 differential and algebraic equations. Tables 1 and 2 show timings for this problem with 25 and 55 parameters, respectively. 4.2. Problem 2: Near-ambient grain drying optimisation This problem is concerned with the determination of the optimal strategy for drying cereal grain by regulating the velocity with which ambient air is driven through a fixed bed of grain. A spatially distributed model [13] is reduced to a system of 154 DAEs by discretising the spatial variations using 4th order orthogonal collocation over 5 finite elements. Tables 3 and 4 show timings for the evaluation of sensitivities in this problem, employing 27 and 51 parameters, respectively.
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
555
Table 1 Timing and processor activity for problem 1 with 25 parameters Processors used JH
SS
Total
0 1 2 1 1
0 25 25 5 4
1 28 29 8 7
Time (s)
% Busy DAE±INT
SM
JH
SS
21.13(3.61) 3.88 3.88 3.88 4.53
± 99.1(90.0) 99.0(89.9) 98.8(89.8) 84.9(76.2)
± 45.7(9.2) 46.9(9.1) 43.7(9.2) 36.6(7.8)
± 73.4(49.2) 43.3(25.4) 74.1(49.5) 62.4(42.0)
± 53.2(19.1) 53.3(19.1) 89.3(72.0) 91.3(78.7)
Table 2 Timing and processor activity for problem 1 with 55 parameters Processors used JH
SS
Total
0 1 2 1 1 1
0 55 55 11 10 9
1 58 59 14 13 12
Time (s)
% Busy DAE±INT
SM
JH
SS
57.30(4.86) 5.29 5.30 5.29 5.33 5.92
± 99.1(82.2) 99.0(89.6) 99.1(87.6) 97.9(88.9) 87.5(78.7)
± 42.3(9.0) 47.7(9.7) 42.8(9.6) 43.5(9.6) 38.0(8.5)
± 67.0(49.9) 43.2(26.1) 71.7(49.1) 73.5(49.9) 65.2(44.2)
± 48.9(17.7) 52.8(19.2) 85.3(68.8) 91.7(75.6) 86.4(73.1)
DAE±INT
SM
JH
SS
99.6(93.3) 99.3(93.1) 99.4(89.8) 90.5(84.7) 90.4(84.0) 62.7(58.9)
27.1(5.8) 26.4(5.7) 25.8(5.8) 24.0(5.2) 24.1(5.4) 17.2(3.8)
56.3(47.4) 56.2(47.6) 77.2(67.1) 51.1(43.2) 72.2(62.6) 99.0(84.9)
55.4(29.2) 88.7(68.6) 85.5(66.2) 94.9(77.7) 94.2(77.1) 73.8(54.1)
DAE±INT
SM
JH
SS
99.5(93.6) 99.3(93.5)
25.9(5.7) 25.3(5.7)
77.4(67.3) 77.1(67.3)
53.1(28.0) 84.1(65.2)
Table 3 Timing and processor activity for problem 2 with 27 parameters Processors used JH
SS
Total
0 3 3 2 3 2 1
0 27 9 9 7 7 7
1 32 14 13 12 11 10
Time (s) 180.01(23.82) 24.46 25.50 25.49 27.92 27.89 40.14
% Busy
Table 4 Timing and processor activity for problem 2 with 51 parameters Processors used JH
SS
Total
Time (s)
0 2 2
0 51 17
1 55 21
394.76(31.35) 33.40 33.43
% Busy
556
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
4.3. Problem 3: Batch distillation column This problem, taken from [3], considers the operation of a batch distillation column used to separate a mixture of cyclohexane, n-heptane and toluene. It involves 861 DAEs and 45 parameters. The results are presented in Table 5. 4.4. General comments on results Table 6 shows a number of statistics for each problem studied. First, we assess the overall performance of the algorithm by comparing the serial time for integration without sensitivity calculation with the best parallel sensitivity time achieved. It can be seen that the additional computational burden associated with the sensitivity evaluation does not exceed 10% in any of these cases. The next column shows the highest level of parallel efficiency that we were able to achieve by optimising the number of JHs and SSs. This is defined as the ratio of the serial execution time by the product of the number of processors and the parallel execution time. We note that all these figures exceed 50% and do not decline significantly as problem size increases. Finally, we give both the theoretical and observed values of the optimal load balancing parameters m* and s* defined in Section 3.4. These show reasonable agreement except for the last case, where each of the sensitivity solvers was able to handle considerably more parameters than the underlying timings would suggest. This is probably due to the sensitivity calculations `catching up' at a number of Table 5 Timing and processor activity for problem 3 with 45 parameters Processors used JH
SS
Total
0 1 2 2 2 3
0 45 45 15 7 15
1 48 49 19 11 20
Time (s)
% Busy
2624.22(317.22) 412.85 320.60 320.41 330.81 320.50
DAE±INT
SM
JH
SS
77.6(76.0) 99.8(98.1) 99.8(98.1) Unable to trace 99.8(98.1)
5.5(1.8) 96.8(92.1) 18.0(11.3) 8.1(2.3) 66.4(63.2) 22.9(14.6) 8.0(2.3) 66.4(63.2) 49.0(40.4) due to 16 Mb cell memory limit 8.1(2.2) 46.9(44.5) 48.8(40.4)
Table 6 Summary of parallel performance over all 5 problems Problem
1 1 2 2 3
N
70 70 154 154 861
Np
25 55 27 51 45
Serial DAE Integration (s) 3.61 4.86 23.82 31.35 317.22
Best parallel time (s) 3.88 5.29 25.46 33.40 320.41
Additional Best Theoretical overhead parallel efficiency m* s*
m*
s*
7.5% 8.8% 6.9% 6.5% 1.0%
1 1 2 2 2
6 6 3 3 7
68.1% 82.7% 58.7% 56.2% 72.1%
1 1 2 2 2
6 6 2 2 2
Observed
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
557
discontinuities occurring in this problem while the integrator process is handling a protracted reinitialisation calculation. The results in Tables 1,2,3 and 5 indicate a significant communication overhead incurred by the JH and, particularly, the SS processes. It is possible that the use of the MPI library for the AP1000, rather than the `native' communication mode, may have contributed to this overhead. However, we felt that there would be little justification for producing a non-portable implementation of our code in pursuit of a slight overall efficiency gain. 5. Concluding remarks This paper has considered the parallelisation of the process of computation of sensitivities of differential-algebraic systems. Because of its practical importance but also its often excessive computational cost, we believe that this is indeed an area that can benefit substantially from the proper application of parallel computation. An interesting feature of the algorithm proposed for this purpose is that parallelism is achieved both within a single time integration step (by the SSs) and across multiple steps (by the JHs). A theoretical analysis leading to optimal values for the numbers of processors in each category has been presented. Experimental results obtained from three problems of various sizes broadly agree with this analysis, and provide ample evidence for the effectiveness of our parallelisation strategy. Practical experience with serial implementations of control vector parameterisation (CVP) algorithms for the solution of dynamic optimisation problems (see, for instance [15]) indicates that most of the time is expended on sensitivity evaluations. Consequently, the substantial acceleration of the latter task should result in very significant improvements in the performance of CVP algorithms, and their wider applicability, especially in online applications. It is worth noting that very recently [9] proposed a novel algorithm for determining sensitivities for DAE systems that does not require the integration of the sensitivity equation (3). A finite difference representation of these equations is solved instead. The difficulty of parallelising the solution of (3) is cited as a major incentive for considering such alternatives. However, as demonstrated in the present paper, it is possible to overcome this difficulty through the use of an appropriate parallel algorithm. Finally, our aim in the present paper has been to compute the sensitivities of a DAE system without significantly increasing the time taken for an integration. Whilst this has been achieved, the question naturally arises of how the DAE integration process itself can be accelerated. Feasible approaches include the parallelisation of lower level operations such as the evaluation of equation residuals and Jacobians, and the linear algebra operations. Most of these have already been studied quite extensively, especially in the context of solving large sets of nonlinear algebraic equations (see, for instance, [11]). Acknowledgements The financial support of the EPSRC through its grant to the Centre for Process Systems Engineering at Imperial College is gratefully acknowledged.
558
B.R. Keeping, C.C. Pantelides / Mathematics and Computers in Simulation 44 (1998) 545±558
References [1] K.E. Brenan, S.L. Campbell, L.R. Petzold, Numerical Solution of Initial Value Problems in Differential-Algebraic Equations, North-Holland, 1989. [2] M. Caracotsios, W.E. Stewart, Sensitivity analysis of initial value problems with mixed ODEs and algebraic equations, Comput. Chem. Engng. 9 (1985) 359±365. [3] M.S. Charalambides, Optimal Design of Integrated Batch Processes, PhD thesis, University of London, 1996. [4] I.S. Duff, MA28 ± a set of Fortran subroutines for sparse unsymmetric linear equations, Technical Report AERE-R8730, AERE Harwell, 1979. [5] I.S. Duff, J.K. Reid, The design of MA48 ± a code for the direct solution of sparse unsymmetric linear systems of equations, ACM Transactions on Mathematical Software 22 (1996) 187±226. [6] A.M. Dunker, The decoupled direct method for calculating sensitivity coefficients in chemical kinetics, J. Chem. Phy. 81 (1984) 2385±2393. [7] C.W. Gear, Simultaneous numerical solution of differential-algebraic equations, IEEE Trans. Ct. Th. CT-18 (1971) 89-95. [8] R.B. Jarvis, C.C. Pantelides, DASOLV ± a differential-algebraic equation solver, Technical report, Imperial College, 1992. [9] T. Maly, L.R. Petzold, Numerical methods and software for sensitivity analysis of differential-algebraic systems, Appl. Num. Math. 20 (1996) 57±79. [10] H.M. Markowitz, The elimination form of the inverse and its application to linear programming, Management Science 3 (1957) 255±269. [11] J.R. Paloschi, Steady state process simulation on MIMD machines: solving nonlinear equations, Comput. Chem. Engng. 19S (1995) S721±S728. [12] H. Rabitz, M. Kramer, D. Dacol, Sensitivity analysis in chemical kinetics, Annu. Rev. Phys. Chem. 34 (1983) 419±461. [13] Y. Sun, C.C. Pantelides, Z.S. Chalabi, Mathematical modelling and simulation of near-ambient grain drying, Computers and Electronics in Agriculture 13 (1995) 243±271. [14] The MPI Forum, MPI: A Message-Passing Interface Standard, University of Tennessee, 1994. [15] V.S. Vassiliadis, R.W.H. Sargent, C.C. Pantelides, Solution of a class of multistage dynamic optimisation problems Part I ± Problems without path constraints, Indust. Engng. Chem. Res. 33 (1994) 2111±2122.