Sparse matrix methods for equation-based chemical process flowsheeting—I

Sparse matrix methods for equation-based chemical process flowsheeting—I

Compurers & Chemicrrl Engineering Printed in Great Britain. Vol. 8. No. 1, pp. 9-18, 009s1354/84 $3.00 + ml Pergamon Press Ltd. 1984 SPARSE MATRI...

1MB Sizes 6 Downloads 122 Views

Compurers & Chemicrrl Engineering Printed in Great Britain.

Vol. 8. No.

1, pp. 9-18,

009s1354/84 $3.00 + ml Pergamon Press Ltd.

1984

SPARSE MATRIX METHODS FOR EQUATION-BASED CHEMICAL PROCESS FLOWSHEETING-I REORDERING

Chemical

PHASE

MARK A. STADTHERR* and E. STEPHENWOOD? Engineering Department, University of Illinois, Urbana, IL 61801, U.S.A. (Received 26 August 1982; received for publication 24 January 1983)

Abstract-Four new sparse matrix reordering algorithms, varying in degree of sophistication, are described. These and five other algorithms, including the P4 and hierarchical partitioning algorithms, are compared on the basis of reordering time and performance. Two of the new techniques offer overall improvement on randomly generated matrix structures, while another performs extremely well on flowsheeting matrices. Scope-The drawbacks to the well-known sequential-modular approach to process flowsheeting are today increasingly being recognized. When applied to complex processes with a number of recycle streams, the sequential-modular approach may become extremely slow, due in part to the nesting of iteration loops in the computation. It is equally important to note that the sequential-modular approach does not efficiently handle problems in which a number of design constraints are imposed, nor is it well-suited to the solution of process optimization problems. A promising alternative for overcoming these limitations is the so-called equationbased approach. In this case the entire process is represented by a large set of generally nonlinear equations that are then solved simultaneously. Today at least five prototype equation-based flowsheeting systems in various stages of development exist: SPEEDUP (Imperial College) [I 1, ASCEND II (Carnegie-Mellon)[2], QUASILIN (University of Cambridge)[3], FLOWSIM (University of Connecticut/Control Data)[4, 51, and SEQUEL (University of Illinois)[6]. Many of the aspects involved in equation-based flowsheeting are discussed in detail by Westerberg et al.[7], Shacham et a1.[5], and Stadtherr & Hilton[6]. The central computational problem in equation-based flowsheeting is the solution of a very large nonlinear equation system. While initial work in equation-based flowsheeting typically involved solution by tearing, the recent trend is toward solution by simultaneous linearization. In this case all the equations are linearized and all the variables iterated on simultaneously, using Newton-Raphson or some related technique. This approach requires the periodic solution of a large, sparse, linear equation system ultimately involving perhaps tens of thousands of equations. Thus arises the need, emphasized in more detail in [7], for sparse matrix techniques capable of efficiently solving such huge systems. This paper is the first in a series of reports in which we consider in detail the computational efficiency of sparse matrix algorithms and associated codes, paying particular attention to performance on matrices with the structure inherent in flowsheeting problems. We focus here on the reordering techniques used for solving sparse matrix problems. In our experience, and that of others[5,8], a priori reordering methods appear preferable in dealing with large matrices with a flowsheeting structure. We compare here five published reordering techniques, namely PREORD[9], P4[10], HP, HP10 and HP20[11], as well as four new techniques, including a new algorithm designed specifically for flowsheeting problems. The comparisons are based on both the speed of reordering and the efficiency with which the solution is found once the reordering is complete. In the second paper in this series we concentrate on solution algorithms and codes. Conclusions and Significance-When simultaneous linearization is applied to the solution of the very large nonlinear equation systems encountered in equation-based chemical process flowsheeting, a key step in the computational procedure is the solution of a very large and sparse system of linear equations. One phase in the solution of such systems is to reorder the matrix in such a way that it can be solved without requiring excessive core storage or computation time. Based on comparisons of reordering speed and performance, we conclude that, of the several methods considered, the new BLOKS reordering algorithm is dramatically better than others for matrices arising in flowsheeting problems. This is because in achieving a desirable reordering BLOKS recognizes and attempts to maintain the natural block structure of the flowsheeting problem. For randomly structured matrices, the new SPKl and SPK2 algorithms, as well as the well-known P4 algorithm, are most attractive. *Author to whom correspondence TPresent address: AmocoChemical

should be addressed. Co., Naperville IL 60540, U.S.A. 9

MARK A. STADTHERRand E. STEPHEN WOOD

10 BACKGROUND

Consider the solution of the linear equation system Ax = b where A is large, sparse, and of order n. The sparse matrix A is typically stored using some sort of “packed” storage scheme in which only the nonzero elements are stored. Several such schemes are reviewed by Duff[l2] and Tewarson [ 131. Generally these schemes require roughly either 27 or 37 storage locations, where r is the number of nonzero elements in A. The usual solution procedure can be represented by the factorization A = LU, where L and U are respectively lower and upper triangular factors. The triangular factors or their equivalent are typically calculated using Gaussian elimination or some variation thereof. This is equivalent to calculating the elimination form of the inverse (EFI). For a large and sparse A, the number of nontrivial nonzeros in L and Umay greatly exceed the original number of nonzeros in A, a phenomenon generally known as “fill-in”. A major concern in the development of sparse matrix methods is the reduction of such fill-in, since the loss of sparsity may lead to excessive storage requirements and considerable increases in computational times. In many, perhaps most, applications, especially those arising in the discretization of PDE problems, the sparse matrix A has rather well-defined structural and numerical properties, such as symmetry, bandedness, and positive definiteness. A considerable number ofspecial-purpose solution techniques have been developed with these applications in mind. Flowsheeting matrices are not so well-defined either structurally or numerically and thus require the use of generalpurpose solution techniques. Nevertheless, as discussed below, flowsheeting matrices do have a certain inherent structure that can be used to advantage in selecting and developing sparse matrix techniques. General-purpose sparse matrix routines in general involve a reordering phase and a numerical phase. In the reordering phase the rows and columns of A are reordered in such a way that sparsity will be maintained during the numerical phase, in which the actual elimination is performed. Reordering is of course equivalent to selecting a sequence of elements on which to pivot during the numerical phase. Reordering techniques are generally classified as either a priori or local. When a priori techniques are used the solution is obtained in two passes: first all or part (rows only or columns only) of the reordering phase is completed, after which the numerical phase begins. When local techniques are used the solution is obtained in only one pass, during which one alternates between the reordering and numerical phases. 123456789T

I

xxx

2x 3xx 4 5x 6 7 8X 9 TX

X x x

X X

X x

X

X xxxx xx

xx

xx

xx

xxxx



xx

X

Though a large number of local reordering strategies have been proposed[l2], the most popular and most effective[l2, 141, remains the simple Markowitz criterion[l5]. In our experience, however, as detailed in the second part[l6] of this series, a two-pass approach is better suited to the very large sparse systems encountered in flowsheeting. Thus, we henceforth concentrate on this approach. One group of a priori reordering techniques are those compared by Duff & Reid[l7]. None of the techniques they consider perform especially well, though in general they result in improvements over no reordering at all. On this basis they recommend that one adopt the simplest technique, which is to reorder the columns in order of increasing column count cj during the first pass, where cj is the number of nonzeros in column i. Rows are then reordered locally during the second pass. Alternatively in the first pass one could reorder rows according to increasing row count rj and reorder columns locally in the second pass. This latter strategy is used by Sherman[9], and we include his first pass algorithm PREORD in the comparisons below. A better group of apriori reordering techniques are those that reorder the matrix into a bordered triangular form (BoTF) or its equivalent. This is desirable because during the numerical phase fill-in can be limited to certain areas of the borders. In general these reordering techniques involve the identification of certain rows and columns, the result of whose removal from the matrix is a triangular matrix. In the mathematical programming literature such rows and columns are typically known as “spikes”, while in the chemical engineering literature they are typically known as “tears”. For purposes of analysis it is usually convenient to regard such rows and columns as the borders in a BoTF, as shown in the occurrence matrix in Fig. l(b). For actual computational purposes it is usually more convenient to use the equivalent “spike form”, as shown in Fig. l(c), which we refer to here as implicit-BoTF (i-BoTF). When Gaussian elimination or its equivalent is used in the numerical phase, fill-in can occur only in the spike columns, and only below the top nonzero in each such column. Thus the amount of potential fill-in in a spike depends on the length of the spike (the “distance” from the top nonzero in the spike to the bottom row). A priori reordering techniques in this group include the well known P3 algorithm[l8] and its descendent P4[10], as well as the hierarchical partitioning algorithms HP, HP10 and HP20[11]. Also included in this group are the several algorithms (e.g. 19, 20) in the

T536798142 8X X 2xx X 1 X x x 9 xxx X 3x X xxx 4 x x X 6 xx x XI xx 5 x IX T X XX xx 7x x X X (6)

T513674928 8X X 2xxx 5 xx xx : x xx 3x x x x T xx xxx 4 X xx 7x X xx 6 X X XX

X ;

XX

CC)

Fig. 1. (a) Occurrence matrix. (b) Reordering of matrix in (a) to bordered triangular form (BoTF). (c) Reordering of matrix in (a) to “spike form” or implicit-BoTF (i-BoTF). Note that the spike columns (1, 4 and 2) and the spike rows (5, 10 and 7) constitute the borders in the corresponding BoTF in (b).

Sparse matrix methods for equation-based chemical process flowsheetingchemical engineering literature for selecting tear variables in the solution of a large nonlinear system by tearing. The original purpose of such tearing algorithms was typically to find a minimal, or at least small, set of tears, allowing the large nonlinear system to be solved by iterating only on this relatively small set of tear variables. However, since the tear variables and equations correspond to the borders in a BoTF, such tearing algorithms may also be used as a priori reordering techniques for limiting fill-in in solving large, sparse linear systems. Unfortunately these tearing algorithms tend to be prone to combinatorial problems when applied to very large matrices, and thus are not especially well suited for use in a priori reordering schemes. In our comparisons below we consider a modification and simplification of one tearing algorithm[20]. Since all but one of the techniques considered here involve a reordering to i-BoTF, we now concentrate on the details of such schemes. REORDERING ALGORITHMS

We first discuss briefly a few techniques common to most of the algorithms. Partitioning

In this context, partitioning refers to reordering a matrix into block-triangular form (BTF). A matrix that cannot be so reordered is referred to as irreducible. The diagonal blocks in a BTF are presumed to be irreducible. For a given reducible matrix, the BTF is unique in the sense that the number of blocks, as well as the rows and columns in each block, are fixed. Partitioning requires two steps. First the matrix is reordered so that a zero-free diagonal (or maximal transversal) is obtained. Then a symmetric permutation can be performed to put the matrix into BTF. Duff[l2] reviews several of the techniques that have been proposed for performing each step. For obtaining a zero-free diagonal the algorithm described and implemented by Duff [2 1,221 is very effective, and for the final symmetric permutation to BTF the algorithm of Tarjan[23] is regarded as the best available. Duff and Reid[24,25] provide a very efficient implementation of Tarjan’s algorithm. Forward and backward triangularization

These are very simple techniques used to triangularize as much of a matrix as possible. For instance, if there is a row with only one nonzero, it could be the first diagonal element in a lower triangular matrix. Thus to forward triangularize: (1) Search for a row with a row count of one. Remove that row and the column in which the one nonzero occurs and put in the first open positions in the reordered matrix. If all row counts are greater than one then go to Step (3). (2) Adjust the row counts to account for the removal of the column in Step 1. Return to Step (1). (3) End. Similarly, if there is a column with only one nonzero, it cbuld be the last diagonal element in a lower triangular form. Thus to backward triangularize: (1) Search for a column with a column count of one. Remove that column and the row in which the one nonzero occurs and put in the last open positions

11

in the reordered matrix. If all column counts are greater than one then go to Step (3). (2) Adjust the column counts to account for the removal of the row in Step (1). Return to Step (1). (3) End. We now proceed to describe the apriori reordering algorithms considered in this study. In the case of algorithms published elsewhere, we provide only a simple outline, for purposes ofcomparison to the new algorithms proposed here. P4[10] Note that what is listed here is a somewhat simplified statement of the P4 algorithm. However it is the complete algorithm that is implemented and used in the numerical comparisons presented below. (1) Backward triangularize. Forward triangularize. (2) Partition the remaining matrix and put irreducible blocks on the block stack. Process the first block on the stack by proceeding to Step (3). (3) Select a spike column and put on the spike stack. Select as the spike the column that has the maximum number of intersections with rows of minimum row count. If the maximum number of such intersections is one, and there is more than one column having one intersection with a row of minimum row count, then choose among these by taking the column intersecting with the most rows of next to minimum row count. Break other ties by choosing the column with the greatest column count. (4) Readjust row counts to account for removal of the spike, and forward triangularize. If a row with no nonzeros occurs, assign it to the spike column on top of the stack, pop that spike from the stack, and place it and the assigned row in the next open positions in the reordered block. If this completes the block, go to Step (6). (5) Go to Step (3). (6) Pop the next block from the block stack and process it by going to Step (3). If there are no blocks left on the stack, go to Step (7). (7) End. It should be noted that Step 4 in the complete algorithm is more involved than indicated here, in that an effort is made to look ahead to complete the block. Also P4 is essentially a modification of the earlier P3 algorithm[l8], the main modification being the inclusion of partitioning in P4. Though P4 includes some look-ahead, the spike selection and subsequent reordering are based primarily on local considerations, without considering the overall structure of the reordered matrix. On the other hand the hierarchical partitioning algorithms, described next, make decisions based on a criterion that attempts to quantify the desirability of the overall reordered matrix structure. This is not to say that the P4 algorithm will not identify such a desirable structure, only that it does not specifically search for it. HP[ll] As in the case of P4, though only a simplified outline of HP is given here, the complete algorithm is the one implemented and used in the numerical comparisons below. (1) Forward triangularize. Backward triangularize. (2) Partition the remaining matrix and put irreducible blocks on the block stack. Process the first block on the stack by proceeding to Step (3).

12

MARK A. STADTHERRand E. STEPHENWool

(3) Make initial spike column selection using the same criteria as used in P4. (4) Adjust row counts and forward triangularize. (5) Make initial spike row selection. The selection criteria are the transpose of those used in Step (3) (i.e. replace row with column and vice versa in the spike column selection criteria). (6) Adjust column counts and backward triangularize. (7) Partition the remaining part of the block into irreducible sub-blocks. Calculate the performance index P,defined to be the sum of the squares of the sub-block orders. (8) Search for a combination of spike column and row that yields the minimum P,applying an exclusion criterion to reduce the number of combinations examined. (Each combination requires partitioning in order to evaluate P.) (9)Reexamine all columns with column counts greater than that of the spike column selected in Step (8). If more than one of these columns would result in the same value of P if chosen as spike, then choose as spike the column with the greatest column count. If any irreducible sub-blocks have been found as a result of partitioning the part of the block remaining after final spike row and column selection, then place these on the block stack. (10) Pop the next block from the block stack and process it by going to Step 3. If there are no more blocks on the stack proceed to Step (11). (11) End. It should be noted that Steps (8) and (9) may involve several applications of partitioning and thus may be quite time consuming. Thus Lin & Mah[l 1] proposed HP10 and HP20, two simplified versions of HP. We also propose a further simplification that we call HP30. HPlO[ll] This is the same as the HP algorithm, except that in Step (8) the number of spike column and row combinations considered is reduced. This increases the reordering speed, but the value of P arrived at may no longer be an absolute minimum. HP20[11] This is the same as the HP algorithm except that Step (8) is omitted in order to further increase reordering speed relative to HPlO. HP30 We propose this algorithm as a further simplification of HP. It is the same as HP except that both Step (8) and Step (9) are omitted. We now present three other new algorithms of varying sophistication: SPKl, SPK2 and BLOKS. The last is designed to exploit the inherent block structure of flowsheeting matrices.

SPK 1 Backward trian(1) Forward triangularize. gularize. (2) Partition the remaining matrix and put irreducible blocks on the block stack. Process the first block on the stack by proceeding to Step (3). (3) Select a spike column or columns and put them on the spike stack. The criteria for spike selection are:

(a) Find the row with minimum row count. If there is a tie, then for each row involved, sum the column counts of columns having nonzeros in that row. Break the tie by selecting the row with the largest such sum. (b) Now consider only the columns having nonzeros in the row selected in step (3a). The selected row is assigned to the column with the smallest column count, and they are placed in the next open positions in the reordered block. The column with the largest column count is then placed on the spike stack, followed by the column with the next largest column count, etc., until all the remaining columns with nonzeros in the selected row have been placed on the spike stack. (4) Readjust row counts and forward triangularize. If a row with no nonzeros occurs, assign it to the column on top of the spike stack, pop that spike from the stack, and put it and the assigned row into the next open positions in the reordered block. If this completes the block, go to Step (6). (5) Go to Step (3). (6) Pop the next block from the block stack and process it by going to Step (3). If there are no blocks left on the stack proceed to Step (7). (7) End. The SPKl algorithm is less sophisticated than P4 in that it offers no look-ahead. This also permits a considerable increase in reordering speed. SPK2 This algorithm is based in part on the ideas used in the tearing algorithm described by Stadtherr et al. [20], which attempts to locate small sub-blocks (in this case 2 x 2). The SPK2 algorithm is the same as SPKl except in the spike selection in Step (3). For SPK2 Step (3) is: (3) Select a spike column or columns and put them on the spike stack. The criteria for spike selection are: (a) Find the row with minimum row count. If there is a tie, then determine for each contending row the number of rows of minimum row count that would result if all the columns with nonzeros in the contending row were deleted. Choose the contending row yielding the most such rows of minimal row count. (b) If there is still a tie among rows with minimum row count, use the tie breaking procedure used in Step (3a) of SPKl. (c) Now consider only the columns having nonzeros in the row selected in Steps (3a) and (3b). Assign the selected row and the corresponding columns to the reordered matrix and to the spike stack as in Step (3b) of SPKl. In comparison to P4, SPK2 offers somewhat more look-ahead, by analyzing the effect of deleting all the columns in a row of minimum row count when selecting the spike column. As a result SPK2 will find all 2 x 2 sub-blocks in selecting a spike column. This is desirable because it in general reduces the length of the spikes and thus reduces potential fill-in. P4 attempts to find the 2 x 2 sub-blocks but may fail, while SPKl may also find them, but only by chance. A common criticism of a priori reordering schemes when applied to flowsheeting matrices is that they do not exploit the inherent block structure of the flowsheeting problem. The next algorithm proposed is designed to take advantage of this structure.

Sparse matrix methods for equation-based BLOKS

This algorithm is for use on systems for which a relatively sparse block-occurrence matrix can be constructed. (1) Identify the block structure of the matrix and the corresponding block-occurrence construct matrix. For flowsheeting matrices, this procedure is discussed in some detail in the next section. (2) Apply the SPK2 algorithm to the blockoccurrence matrix. (3) Expand the reordered block-occurrence matrix into a standard occurrence matrix on the equationvariable level. Backward trian(4) Forward triangularize. gularize. Set k = 1. (5) Define as “active” for spike selection the remaining rows in the kth block-row in the reordered block-occurrence matrix. If no rows remain in the kth block row go to Step (7). (6) Apply Steps (3-5) of the SPKl algorithm, except restrict Step (3a) (of SPKl) to those rows currently defined as active. Note that in Step (4) (of SPKl) all rows are considered for forward triangularization, not just those active for spike selection. When the last active row has been reordered exit from Step (4) (of SPKl) and proceed to the next step below rather than to Step (6) of SPKl. (7) If k = N, where N is the number of block-rows, go to Step (8). Otherwise set k = k + 1 and go to Step (5). (8) End. Note that since the block-occurrence matrix will in general be much smaller than the actual occurrence matrix, the block-reordering in Step (2) will be relatively inexpensive to obtain. Similarly by restricting spike selection to a relatively small number of active rows at a time, as determined by the overall block structure, the equation-variable reordering expense can be kept relatively low. As the results presented below demonstrate, by exploiting the block structure of a matrix in this fashion, the reordering times for flowsheeting matrices can be drastically reduced, while maintaining excellent results in terms of storage and computation time during the numerical phase. Maintaining numerical stability

The various sparse matrix reordering schemes essentially select a pivot sequence based on some criterion chosen to maintain sparsity. For full matrix algorithms the pivot sequence is generally determined based on some criterion chosen to maintain numerical stability. This usually involves selecting as the pivot the element with the largest absolute value in the current row or column (partial pivoting) or in the entire remaining matrix (complete pivoting). Unfortunately the goals of maintaining sparsity and numerical stability may be mutually exclusive. If in using the pivot sequence chosen by a sparse reordering scheme, an element much smaller in magnitude relative to other elements in its row is pivoted on during the numerical phase, then numerical instabilities may arise. This dilemma is usually resolved by applying some type of threshold pivoting method. For instance one may defme an acceptable pivot to be any element larger in magnitude than PTOL times the largest magnitude element in the same row, where PTOL is

chemical process flowsheeting-l

13

a user supplied pivot tolerance. Jf in using an apriori reordering to i-BoTF, a chosen pivot element does not meet the threshold test above, then a column interchange must be performed to provide an acceptable pivot. The techniques used to maintain stability during the numerical phase of solving a sparse matrix are discussed in more detail in the next part[l6] of this series. BLOCK STRUCTURE OF FLOWSHEETING MATRICES

Because of the unit-stream nature of flowsheeting problems, the resulting matrices have an inherent and easily identifiable block structure. However there are a number of ways in which a block-occurrence matrix for use in the BLOKS algorithm could be constructed. The construction chosen will depend on the equation-variable formulation used, and on the type of output produced by the equation generating routine. We describe here the technique used in this study to construct the block-occurrence matrices for the flowsheeting test problems. In the following procedure each equilibrium stage in a staged operation is treated as a separate unit. (1) Assign to each unit a number of block-columns equal to the number of input and output streams associated with the unit. Each block-column comprises the variables describing one of the input or output streams. Any internal variables associated with the unit are assigned to one of the blockcolumns. Note that any block column representing a connecting stream will be assigned to two units. (2) Assign to each unit a number of block-rows equal to the number of output streams. The equations describing the unit could in principle be divided among block-rows arbitrarily, but in general we suggest that the equations be assigned so that each block-row has roughly the same number of equations, and so that as many zero-blocks as possible are created. For example, consider a flash unit, It is assigned three block-columns and two block-rows. By putting all the material balances and the energy balance in the same block-row, one zero-block can be obtained. Similarly for an equilibrium stage there are four block-columns and two block-rows, and one zeroblock can be obtained. (3) At this point specification equations have not been accounted for. Since these may involve variables in any of the block-columns, we add one or more full block-rows to the block-occurrence matrix. Since the SPK2 algorithm used by BLOKS to reorder the block-occurrence matrix is designed for reordering a square matrix, the number of full block-rows added is the number needed to make the block-occurrence matrix square. It should be noted that the a priori reordering methods used will essentially ignore a full row, since it must be reordered last. Thus the decrease in block-sparsity caused by adding full block-rows is immaterial. Since most specifications will involve only one variable, these equations will generally be put first in the equation-variable reordering by the forward triangularization in Step (4) of BLOKS. As a final example consider the Cavett problem flowsheet as shown in Fig. 2. Each mixer is assigned one block-row and three (or four) block-columns; each flash has two block-rows and three block-

MARK A. STADTHBRRand

Fig. 2. Flowsheet

E. STEPHEN WOOD

for Cavett problem used as example in constructing a block-occurrence 1 and 4 are mixers; units 2, 3. 5 and 6 are flash drums.

matrix. Units

active if its topmost element is in or above the given row. For example in Fig. 4(a), row 2 has two active spikes, namely columns 8 and 10, and row 4 has four active spikes, namely columns 6, 7, 8 and 10. The significance of the relation between the number of spikes and the maximum number of local spikes is illustrated in Fig. 4. The reorderings in Fig. 4(a) and (b) both have four spikes. However the maximum number of local spikes is four in Fig. 4(a), and only two in Fig. 4(b). In general when the maximum number of local spikes is small relative to the number of spikes, as in Fig. 4(b), the reordering contains a significant amount of sub-block structure. For instance Fig. 4(b) has three 2 x 2 sub-blocks on the diagonal. On the other hand when the maximum number of local spikes is nearly the same as the number of spikes, as in Fig. 4(a), the reordering typically contains little block structure. Since the randomly structured matrices have no inherent block structure, even the best reorderings will resemble Fig. 4(a) and have little or no block structure. However for flowsheeting matrices, one simple way of evaluating reordering schemes is to compare the number of spikes and the maximum number of local spikes. Good reorderings should tend to exploit the inherent block structure, and thus in general a good reordering should resemble Fig. 4(b) and have a maximum number of local spikes much smaller than the number of spikes.

is 123456789TE 1 xx X 2 xxx xx 3 X xx xx X X 2 xxx

X xx X

6 SPEC

X

xx

X X X X X X X X X ;

Fig. 3. Block-occurrence

;

matrix for flowsheet

in Fig. 2.

columns. To account for specifications and to square up the matrix, one full block-row is added. The resulting block-occurrence matrix is shown in Fig. 3. Note that as discussed above each flash unit has a zero-block associated with it. COMPARISON OF REORDERING ALGORITHMS

Test problems Two types of test problems are used, those with a randomly generated structure and those with a flowsheeting structure. The randomly structured matrices are generated with an average of 3.5 nonzeros per row. The row and column counts are distributed normally about the average of 3.5 with the constraint that there be at least one nonzero per row and per column. The matrix sizes considered range from 50 to 1200 equations. Note that the BLOKS algorithm is not applicable to these matrices. The flowsheeting matrices are generated from the flowsheets described in [26]. The flowsheet structures are taken from problems used by Sood Kc Reklaitis[27] and Motard & Westerberg[28]. However the units involved are limited to mixers, splitters, heaters, flash drums, equilibrium stages with ideal thermodynamics, and equilibrium stages with nonideal thermodynamics. Thus the flowsheets do not correspond to any actual process. This provides flexibility in generati?g test problems and is consistent with the philosophy that since the development of general-purpose flowsheeting techniques is the desired goal, numerical comparisons should not be tied to any particular existing processes. The matrix sizes range from 152 to 2158 equations.

Numerical comparison criteria The ultimate test of an a priori reordering scheme

is the resulting computational efficiency of the numerical phase. In the comparisons below we include the amount of fill-in and the operation count (multiplications and divisions) for calculating the EFI and the solution for one b-vector. Also of obvious significance is the reordering time for each algorithm.

123456789T 1 x

2 x 3 4 5 6 7 x 8XX QXXX

X

x X xx xxx xx xxxxxx xxx xx xxxx xxxxx xxxxxx X

TXXX

Structural comparison criteria

The results presented below include two structural criteria, the number of spikes and the maximum number of “local” spikes. For a given row, the number of local spikes is defined to be the number of “active” spikes to the right of the diagonal. A spike

x x

CA>

x x X x X

123456789T 1 x

X

X

xx 2 3xxx 4 xxx 5x x 6 X x 7 8 9 TX

xx xx x

X x X

xx

xxx xx xx


Fig. 4. (a) Occurrence matrix showing maximum number of local spikes same as total number of spikes. (b) Occurrence matrix showing small maximum number of local spikes relative to total number of spikes. Note 2 x 2 block structure on diagonal.

Sparse matrix methods for equation-based chemical process flowsheetingIn order to make comparisons with these reordering phase times, we also present the time required for the solution of the system for a single b-vector, using our code LUlSOL. As demonstrated in [16] this is an extremely efficient code for implementing row-by-row Gaussian elimination on a matrix reordered a priori to i-BoTF. The times given are in CPU msec on a CDC Cyber 175. Array storage for the numerical phase was limited to 110,000 words of core. Since we wish to isolate here the performance of the various u priori reordering algorithms, and not include the effect of any additional reordering due to whatever threshold pivoting scheme is used to main-

15

tain numerical stability, the numerical phase results presented are for problems on which no threshold pivoting was performed, columns being interchanged during the numerical phase only as needed to avoid zero pivots. This amounts to setting PTOL.= 0. For PTOL > 0 the computational expense for the numerical phase increases by an amount dependent on the value of PTOL and on the threshold pivoting scheme used[l6]. Randomly structured matrices

The numerical results for the randomly structured matrices are listed in Table 1. Note that some reor-

Table 1. Results for several reordering algorithms on randomly generated matrices. The number in parentheses under the entry for matrix order is the total number of nonzeros in the matrix. Times are in CPU milliseconds on a CDC Cyber 175 MATRIX ORDER 50 <176>

ALGORITHM

MAX. LOCAL SPIKES

SPIKES

NONE PREORD SPK I

z

21 0

&I?2 HP30 HP20 HP10 HP 100 (351)

REORDER TIME CMSFCJ

$4 165 221

NONE PREORD SPK I

I ;ii

spPz2 HP30 HP20 HP10 HP 200 (6992

;s8 236 I565 2252

NONE PREORD SPK I

3: 126 88 I77 610 906 I 12812

&I?2 HP30 HP20 HP10 HP 400 <1400>

NONE PREORD SPK 1

I 359: 310 605 II632 05973 I52466

siz2 HP30 HP20 HP10 HP 600 C2600)

NONE PREORD SPK I Sk?2 HP30 HP20 HP10 HP

I200 C4202)

---

NONE PREORD SPK I

99

SZ2 HP30 HP20 HP10 HP

:: 92 ----

2498 I I5507 NOT ATTEMPTED -NOT ATTEMPTED --

I00

82:

ii? 2656 2458 95 5300 NOT ATTEMPTED -NOT ATTEMPTED -NOT ATTEMPTED --

EFI : NO. OF FILL-IN

EFI : NO. OF OPERATIONS

SOLUTION TIME CMSEC> 27 13 9 I0 9 9

542 248 128 I31 123 127

3560 I360 764 782 744 753

135 I25 126

789 750 755

: 10

21607 5893 2496 2558 2405 2515 2469 2447 2413

141 45

127119 2229 I I0278 8791 8563 I0556 9090 9785 9908

875 153 76 68

2016 886 4613 468 448 463 448 445 437 7050 2375 I676 I440 I406 1637 I565 I544 I571 23696 6702 6952 6582 6558 6220 8558 5078 5884

725634 137060 53587 57322 54154 55585 60095 51342 50756

2244 22 24 22: 23

SiJ

81 74 74 72 4062 796 326 343 332 347 359 313 317

-INSUFFICIENT STORAGE -40882 9028 I 70684 I 27753 394639 2073 2267 26787 437 163 2558 I 375095 I998 27174 238 I 460604 27046 4 I6428 2168

-INSUFFICSENT -INSUFFICIENT 60483 1310324 55814 I1 43372 54252 I054823 57492 I235264

STORAGE -STORAGE -6532 5836 5449 6317

MARK

16

A. STADTHERRand E. STEPHENWOOD

Table 2. Summary of reordering performance on randomly generated matrices. The per cent excess fill-in and excess operations were found by calculating for each algorithm the percentage by which fill-in and operations exceeded the minimum found for each problem, and then averaging the results for each algorithm over the number of problems solved. Note that, as indicated in Table 1, some of the reordering algorithms could not be used on all six problems EFI

METHOD NONE PREORD SPK 1 P4 SPK2 HP30 HP20 HP10 HP

PERCENT EXCESS FILL-IN 352 101 13 6 69 23 9 I 82 37 7 1

PERCENT EXCESS OPERATIONS 974 185 I0 4 87 I I 14 2 93 45 44

derings were not attempted on some of the larger problems, since simpler algorithms had already proven very expensive. There are two important observations to be made. First, all of the seven algorithms based on a reordering to i-BoTF result in comparable performance in the numerical phase, and all significantly outdo the much simpler PREORD in this regard, as emphasized in Table 2. Secondly, we note the wide variation in reordering times. Excluding PREORD, which results in an inferior numerical phase performance, SPKl consistently gives the best reorderings times, with SPK2 and P4 next. The times for the HP algorithms seem unacceptable, especially in comparison to solution times. The simplest HP algorithm, HP30, is the only one for which reordering time does not grow faster than solution time as problem size increases. Reasons for this are discussed in more detail below. The obvious conclusion is that since the performance of the reordering algorithms except PREORD is roughly the same, one should choose among the faster algorithms, namely SPKl, SPK2 and P4. For problems to be solved for only one b-vector we recommend SPKl; it is faster than SPK2 or P4 by about a factor of three on the average, and its performance is not significantly worse. For problems to be solved for several b-vectors, the somewhat better performance of P4 or SPK2 may be desirable. The SPK2 algorithm, which involves a somewhat more sophisticated look-ahead scheme than P4, performed especially well on this set of test problems.

observed in these test problems, we have noted an occasional very poor performance from the SPK2 algorithm as well, though less frequently than with SPKl, P4 and HP30. The occasional poor reorderings appear to result when the algorithms start selecting successive spikes from different process blocks, thus in effect destroying the natural process-block structure of the problem. For instance on the 1060 equation problem the natural ordering is better than that from SPKl. The somewhat better performance of SPK2 may be explained by noting that it does search for some block structure, though only 2 x 2’s. HP, HP10 and HP20 all search for a desirable overall block structure, and apparently as a result perform consistently well on these flowsheeting problems. However this very search for block structure results in the unacceptably long reordering times required by the HP algorithms. The problem lies in the number of times partitioning is performed in order to search for blocks. Even in HP30, which requires many fewer partitionings than the other HP algorithms, the partitioning routines account for 30-50x of the total reordering time. It should be noted that our implementations of these algorithms use simple adaptations of the very effective Harwell library subroutines MC13 and MC21 for partitioning. We also note that in implementing HP30 it is possible during spike selection and triangularization to maintain a large number of nonzeros on the diagonal, thus reducing the amount of time spent in the transversal selection (MC21) phase of partitioning. While the HP algorithms must search for a desirable block structure, the new BLOKS algorithm starts with a natural block structure and tries to maintain it while still achieving a desirable overall equation and variable reordering. The effectiveness with which BLOKS maintains block structure can be seen in the relatively small maximum number of local spikes compared to the total number of spikes, especially for the larger problems. If there is a disadvantage to the BLOKS algorithm it is that it requires the user to define a block-occurrence matrix. This however can be very easily done as part of the equation generating routine in the equation-based flowsheeting package. The new BLOKS reordering algorithm can be highly recommended not only for flowsheeting problems, but for other very large systems for which a sparse block-occurrence matrix can be constructed. Availability

Flowsheeting matrices The numerical results for the flowsheeting matrices

are shown in Table 3. Here we refer to the case of no reordering as “natural”, since as the matrix comes from the equation generator it already has a certain amount of desirable order, simply because of the natural process-block structure of the problem. Clearly the most attractive reordering scheme is the BLOKS algorithm proposed above. Not only does it result in excellent numerical phase performance, but it is by far the fastest of the schemes that perform well. The SPKl, P4 and HP30 algorithms perform inconsistently. They perform well on most problems, but very poorly on occasion. Although it is not

of codes

It is our intention to make our reordering-phase codes SPKl, SPKZ, BLOKS and HP30, as well as our implementation of P4, and the new numerical-phase codes described in [16], available as part of a new sparse matrix package SPARZPAS. More information regarding the availability of SPAR2PAS can be obtained by contacting Professor Stadtherr. Acknowledgements-This work has been supported by the National Science Foundation under Grant CPE 80-12428. We also acknowledge the invaluable assistance of our colleague, Dr. Hem-Shann Chen, who provided the algorithm and code for the HP30 reordering, as well as the LUISOL solution code.

Sparse matrix methods for equation-based

chemical process flowsheeting-

17

Table 3. Results for several reordering algorithms on flowsheeting matrices. The number in parentheses under the matrix order is the total number of nonzeros in the matrix. Times are in CPU milliseconds on a CDC Cyber 175 MATRIX ORDER 162 (683)

ALGORITHM

MAX. LOCAL SPIKES

SPIKES

0 t: ii: 49% I029 I376 21

NATURAL PREORD SPK I

3 :x; 572 1019 14071 38884

SF;2 HP30 HP20 HP10 BLZ 584 C3963J

482z54

31: 626 752 I306 2427 27068 I 40ai1:

BL% NATURAL PREORD SPK I

6 1024 1374 1701 4367 5614 > I00000 NOT ATTEMPTED -238 177 235 235 236 292 254

spPz2 HP30 HP20 HP10 HP BLOKS I564 i9369J

NATURAL PREORD SPK I sz2 HP30 HP20 HP10 HP BLOKS

2158 C18168J

sz2 HP30 HP20 HP10 HP BLOKS

CACE

Vol. 8. No.

I-B

I0 l33i 347 1695 zz: 2360 383 2735 374 44305 NOT ATTEMPTED --_ NO1 ATTEMPTED -16 349 251

NATURAL ‘~%~”

960

426 401 455 424 4t0 405 387 445 I9846 6967 3288 2871 3365 3::;s 3320 3077 3670 22903 I6352

NATURAL PREORD SPK I s%2 HP30 HP20 HP10

I060 C6254J

EFI : NO. OF FILL-IN 3120

NATURAL PREORD SPK l P4 SRKP HP30 HP20 HP10 BL%S

372 C3253J

REORDER TIME CMSECJ

I 42 206 68 44 ---25

602 320 I 614 4135 613 6416 651 6151 NOT ATTEMPTED -NOT ATTEMPTED -NO-f ATTEMPTED -604 365

“4% 4517 5039 4771 4862 4982 5101

EFI : NO. OF OPERATIONS 32339 7712 3613 3533 3978 3773 2640 3598 3448 3712

x 63133 577866 59836 59340 54203 51689

2728 793 403 364 431 3869 402 409 367 390

440343 600464 84083 63066 61649 69666 64795 66097 68508 65003

2585 3188 563 451 439 457 436 452 462 418

501 137 113911

39217 507067 -‘“““‘2’:;;:~; 64123 21204 171 I79 561 I0 6175 27308 379228 61724 6577 7432

66416

863256 55549 -INSUFFICIENT 39102 587096 :xi I0288 9546 0676

I l73~~~: 95899 88984 79250

.-- INSUFFICIENT -INSUFFICIENT -INSUFFICIENT 990473 80363 32693 407497 274589 22131

20103

SOLUTION TIME CMSECJ 199

241 I49

3466 STORAGE -I0537 ‘Zi 2044 474 490

STORAGE

--

%5 748 726 704 622 STORAGE -STORAGE -STORAGE -7837 z;;

IS70

MARK

18

A. STADTHERR and E. STEPHENWOOD

REFERENCES

1. J. D. Perkins & R. W. H. Sargent, SPEEDUP: A computer program for steady-state and dynamic simulation and design of chemical processes. In Selected Topics on Computer-Aided

Process Design and Analysis

(Edited by R. S. H. Mah & G. V. Reklaitis), AfChE Symp. Series 78(214), 1 (1982). 2. D. R. Benjamin, M. H. Locke & A. W. Westerberg, Interactive programs for process design. Design Research Center, Carnegie-Mellon University, Rep. No. DRC-06-28-81 (1981). 3. E. W. Gorczynski, H. P. Hutchison & A. R. M. Wajih,

Development of a modularly organized equationoriented process simulator. Compur. Chem. Engng 3, 353 (1979). 4. P. D. Babcock, Equation oriented flowsheet simulation:

Opening a new world of process simulation. Presented at A.2.Ch.E. Annual Meeting, New Orleans, 8-12 Nov. 1981. 5. M. Shacham, S. Macchietto, L. F. Stutzman & P. Babcock, Equation oriented approach to process flowsheeting. Comput. Chem. Engng 6, 79 (1982). 6. M. A. Stadtherr & C. M. Hilton, Development of a new equation-based process flowsheeting system: Numerical studies. In Seleited Topics on Com&ier-Aided Process Design and Analysis (Edited by R. S. H. Mah & G. V. Reklaitis), AIChE Symp. Series 7&(214), 12 (1982). 7. A. W. Westerberg, H. P. Hutchison, R. L. Motard & P. Winter, Process Flowsheeting. Cambridge University Press, Cambridge (1979). 8. T. D. Lin & R. S. H. Mah, A sparse computation system for process design and simulation: I. Data structures and processing techniques. A.I.Ch.E.J. 24, 830 (1978). 9. A. H. Sherman, Algorithm 533. NSPIV, a Fortran subroutine for sparse Gaussian elimination with partial pivoting. Trans. Math. Software 4, 391 (1978). 10. E. Hellerman & D. Rarick, The partitioned preassigned pivot procedure (P4). In Sparse Matrices and Their Applicarions (Edited by D. J. Rose & R. A. Willoughby). Plenum Press, New York (1972). 11 T. D. Lin & R. S. H. Mah, Hierarchical partition-a new optimal pivoting algorithm. Math. Programming 12, 260 (1977).

12. I. S. Duff, A survey of sparse matrix research. Proc. IEEE 65, 500 (1977).

13. R. P. Tewarson, Sparse Matrices. Academic Press, New York (1973). 14. I. S. Duff, Practical comparison of codes for the solution of sparse linear systems. In Sparse Matrix Proceedings-1978 (Edited by I. S. Duff L G. W. Stewart). SIAM, Philadelphia (1979). 15. H. M. Markowitz, The elimination form of the inverse. and its application to linear programming. Management Sci. 3, 255 (1957). 16. M. A. Stadtherr & E. S. Wood, Sparse matrix methods

for equation-based chemical pro-&s flowsheeting: II. Numerical phase. Comput. Chem. Engng 8, 19 (1984). 17. I. S. Duff & J. K. Reid, A comparison of sparsity orderings for obtaining a pivotal sequence in Gaussian elimination. J. Inst. Math. ADDI.14. 281 (1974). 18. E. Hellerman & D. Rarick, ‘ieinve&ion &th the preassigned pivot procedure. Math. Programming 1, 195 (1971). 19. J. H. Christensen, The structuring of process optimization. A.I.Ch.E.J. 16, 177 (1970). 20. M. A. Stadtherr, W. A. Gifford L L. E. Striven, Efficient solution of sparse sets of design equations. Chem. Engng Sci. 29, 1025 (1974). 21. I. S. Dti, On algorithms for obtaining a maximum transversal. Trans. Math. Software 7, 315 (1981). 22. I. S. Duff, Algorithm 575. Permutations for a zero-free diagonal. Trans. Math. Software 7, 387 (1981). 23. R. Tarjan, Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 146 (1972).

24. I. S. Duff & J. K. Reid, An implementation of Tarjan’s algorithm for the block-triangularization of a matrix. Trans. Math. Software 4, 137 (1978). 25. I. S. Duff & J. K. Reid, Algorithm 529. Permutations to block-triangular form. Trans. Math. Software 4, 189

(1978). 26. E. S. Wood, Two-pass strategies for sparse matrix computation. in chemical process flowsheeting problems. Ph.D. Thesis. Universitv of Illinois. Urbana (1982). 27. M. K. Sood &G. V. Reklaitis, User’s Manual: Material Balance Program-II, 2nd Edn. School of Chemical Engineering, Purdue University, Lafayette (1975). 28. R. L. Motard & A. W. Westerberg, Exclusive tear sets for flowsheets. A.I.Ch.E.J. 27, 725 (1981).