Preconditioned Krylov subspace methods for solving nonsymmetric matrices from CFD applications

Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840 www.elsevier.com/locate/cma Preconditioned Krylov subspace methods for solving nonsymmetric ma...

Download PDF

307KB Sizes 82 Downloads 81 Views

Report

PDF Reader
Full Text

Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

www.elsevier.com/locate/cma

Preconditioned Krylov subspace methods for solving nonsymmetric matrices from CFD applications q Jun Zhang 1 Department of Computer Science, University of Kentucky, 773 Anderson Hall, Lexington, KY 40506-0046, USA Received 20 July 1998

Abstract We conduct an experimental study on the behavior of several preconditioned iterative methods to solve nonsymmetric matrices arising from computational ¯uid dynamics (CFD) applications. The preconditioned iterative methods consist of Krylov subspace accelerators and a powerful general purpose multilevel block ILU (BILUM) preconditioner. The BILUM preconditioner and an enhanced version of it are slightly modi®ed versions of the originally proposed preconditioners. They will be used in combination with dierent Krylov subspace methods. We choose to test three popular transpose-free Krylov subspace methods: BiCGSTAB, GMRES and TFQMR. Numerical experiments, using several sets of test matrices arising from various relevant CFD applications, are reported. Ó 2000 Elsevier Science S.A. All rights reserved. AMS classi®cation: 65F10; 65F50; 65N06; 65N55 Keywords: Multilevel preconditioner; Krylov subspace methods; Nonsymmetric matrices; CFD applications

1. Introduction A challenging problem in computational ¯uid dynamics (CFD) is the ecient solution of large sparse linear systems of the form Ax b;

1

where A is a nonsymmetric matrix of order n. For example, such systems may arise from ®nite element or ®nite volume discretizations of various formulations of 2D or 3D incompressible Navier±Stokes equations. The large size of the problems usually precludes the consideration of direct solution methods based on Gaussian elimination due to the prohibitive memory requirement. Iterative methods are believed to be the only viable solution means in such applications. However, not all iterative methods are suitable and ef®cient for solving general CFD problems. Matrices arising from CFD applications, with ®nite element (volume) discretizations on unstructured domains, are typically nonsymmetric and unstructured. The lack of regular structure limits ecient utilization of certain relaxation based fast iterative methods, such as the standard multigrid method. As a result, the Krylov subspace methods become the natural choice for these problems. Furthermore, many realistic matrices are illconditioned and standard Krylov subspace methods applied to such matrices converge very slowly or sometimes diverge. The uncertainty in performance of iterative q 1

This research was supported in part by the University of Kentucky Center for Computational Sciences. E-mail address: [email protected] (J. Zhang). URL: http://www.cs.uky.edu/jzhang

0045-7825/00/$ - see front matter Ó 2000 Elsevier Science S.A. All rights reserved. PII: S 0 0 4 5 - 7 8 2 5 ( 9 9 ) 0 0 3 4 5 - X

826

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

methods is currently the major impedance in the acceptance of such promising techniques in industrial applications. Fortunately, the reliability and robustness of iterative methods can be improved dramatically by the use of preconditioning techniques. In fact, preconditioning is critical in making iterative methods practically useful [4]. Thus, in practice, large sparse nonsymmetric linear systems are often solved by Krylov subspace methods coupled with a suitable preconditioner [17]. A preconditioning process consists of some auxiliary operations, which solve a linear system with the matrix A approximately. The preconditioner can itself be a direct solver associated with a nearby matrix based some incomplete factorizations, or a few iterations of an iterative technique involving A, such as the standard relaxation type methods. Although existence can only be guaranteed for the restricted class of M-matrices, preconditioners have been developed and are successfully used for many applications. Many ef®cient preconditioners are tailored for the speci®c applications and their ef®cient usages are restricted to these applications. Although there may not exist a preconditioner that is ef®cient for all applications, there is still strong interest in developing the so-called ``general purpose'' preconditioners that are ef®cient for a large group of applications. The multilevel block ILU (BILUM) preconditioner is one of such examples [20]. This domain based multilevel preconditioner uses Schur complement strategies that are common in domain decomposition techniques. The aim of this paper is to study the performance of several promising Krylov subspace methods preconditioned by a modi®ed BILUM in solving a few nonsymmetric matrices arising from various realistic CFD applications. Our studies are empirical and our goal is to give an indication how the performance of the Krylov subspace methods is aected by the quality of the preconditioner. We hope to give some evidences to support a general consensus (among preconditioning technique practitioners) that it is the quality of the preconditioners that determines the failure or success of a preconditioned iterative scheme. Thus more eorts should be invested in the design of robust and ecient preconditioners. This paper is organized in ®ve sections. Section 2 brie¯y discusses the Krylov subspace methods. Section 3 outlines the proposed high accuracy multilevel block ILU preconditioner. Section 4 contains our numerical experiments on several collections of nonsymmetric matrices from realistic CFD applications. Concluding remarks are given in Section 5. 2. Krylov subspace methods The past few decades has seen a ¯ourish of iterative techniques, many of them are the so-called Krylov subspace methods. They are usually referred to as parameter-free iterative methods, because they are free from choosing parameters such as those required in successive over relaxation (SOR) type methods [17]. Since we concentrate on solving nonsymmetric problems, a representative group of Krylov subspace methods for such a purpose include: conjugate gradient method applied to normal equations (CGN) [12]; generalized minimum residual method (GMRES) [18]; biconjugate gradient stabilized (BiCGSTAB) [26]; and a transpose-free variant of quasi-minimum residual method (TFQMR) [8]. There are other methods that are variants of these methods, but are not discussed here due to the space and due to the fact that their behaviors are more or less resembling one of the methods under consideration. Furthermore, since CGN requires the availability or action of AT , the transpose of A, and in some cases this is costly or inconvenient or impossible, we do not consider CGN in this paper. An introduction to these and other modern iterative methods can be found in [17], a recent survey of iterative solution methods for linear systems is in [9], algorithms and availability of computer codes have been outlined in [1]. We believe the three iterative methods, BiCGSTAB, GMRES, and TFQMR, are most promising among the Krylov subspace methods and are representative. Over the past years, eorts have been invested to compare various Krylov subspace methods, see, e.g., [13] for some theoretical discussions, where no method was found to be the best for all problems tested. Early comparison of Krylov subspace methods preconditioned by ILU(0) and MILU was reported in [14], where the slightly more elaborate MILU was found much less ecient than the standard ILU(0). Comparison of preconditioned nonsymmetric Krylov subspace methods on a large scale MIMD machine has been reported in [25], where some standard preconditioners such as ILU(0) and relaxation methods (Jacobi) were employed and uniform discretization PDE problems were tested. A very recent application involving the three Krylov subspace methods under

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

827

current consideration with some specialized preconditioner was reported in [3] for certain problems in neutron diusion equation. Most of the tests so far have shown that there is no clear winner among the three Krylov subspace methods, a conclusion that is in line with that of [13]. However, in terms of operation costs, BiCGSTAB seems to be slightly cheaper than others [3]. Recently, in connection with the development of sparse approximate inverse preconditioners, these Krylov subspace methods have been used as accelerators and compared in [2,11]. Once again, BiCGSTAB was shown to be faster than the other two in most cases. In spite of these results, there seems to be no comparison of preconditioned Krylov subspace methods involving a more powerful general purpose preconditioner and more realistic nonsymmetric problems from CFD applications. High accuracy preconditioners are relatively new and they have been mostly tested with GMRES [6,4]. A comprehensive comparison of Krylov subspace methods with modern preconditioning techniques will undoubtedly oer information that may help researchers and application engineers choose suitable iterative schemes for their own problems. 3. Multilevel block ILU preconditioner The BILUM preconditioning techniques are based on a reordering of the matrix A in a two by two block format D F ; 2 A PAP T E C where P is a permutation matrix. The matrix P is chosen such that D is a block diagonal matrix and is easily invertible. Such an ordering can be found by forming a block independent set. A block independent set is a set of groups (blocks) of unknowns such that there is no coupling between unknowns of any two different groups (blocks) [20]. Although the sizes of the blocks in a block independent set do not have to be uniform, uniform size blocks are commonly searched for the sake of easy programming and of potential load balancing in parallel implementations. In BILUM, the unknowns of the block independent set are listed ®rst and the rest of the unknowns form a reduced system. The block independent set ordering induces a block LU factorization of the form D F I 0 D F LU ; 3 0 A1 E C EDÿ1 I where A1 C ÿ EDÿ1 F is the Schur complement with respect to C and I is the generic identity matrix. Since C; E; D and F are sparse matrices, A1 is also sparse in general. In large applications, A1 may still be too large to be solved inexpensively and further reduction may be needed. In order to maintain sparsity of the LU factors, a dropping strategy is adopted when computing the Schur complement A1 , based on some given threshold tolerance and the resulting factorization is incomplete. The BILUM preconditioner is obtained by recursively applying the above procedures to the successive Schur complements up to a certain number of levels. The last reduced system obtained is then solved by a direct method or a preconditioned iterative method [20]. There are basically two kinds of dropping strategies used to control the sparsity of the reduced systems. The ®rst is called the single dropping strategy, small elements whose absolute values are less than the average of the nonzero absolute values of the current row times a threshold tolerance s are discarded [20]. This strategy sometimes fails to control the ®ll-in amount for large scale applications. A more aggressive double dropping strategy is introduced in [19] to control the amount of ®ll-in. First, the single dropping strategy is applied, then only the largest p elements in absolute values of each rows are kept. The advantage of the double dropping strategy is that we can predict the amount of ®ll-in of the resulting preconditioner in advance. Inverting blocks. In [20], the inverse of the block diagonal matrix D is obtained by inverting each individual (small) block exactly. For inde®nite matrices, some blocks may be near-singular and illconditioned and a singular value decomposition (SVD) based regularized inverse technique may be used to invert the

828

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

blocks approximately [23]. Here we also give a different version of SVD regularized inverse. Suppose a matrix B has an SVD of the form [10, p. 16±17] B U RV T ;

4

where U and V are two orthogonal matrices and R diagr1 ; r2 ; . . . ; rs , with r1 P r2 P P rs P 0: See Theorem 2.3-1 of [10, p. 16±17] for details on the SVD factorization. It is well known that kBk2 r1 . If B is illconditioned or near-singular, some of its small singular values are close to zero. The inverse of B can be expressed as Bÿ1 V Rÿ1 U T ; ÿ1 ÿ1 ÿ1 may be very large if some of its singular with Rÿ1 diagrÿ1 1 ; r2 ; . . . ; rs . Thus, some elements of R values are small. We use a regularization approach which consists of perturbing only the smallest singular values to obtain a stabilized inverse. A similar strategy was advocated in [5,23]. Although the underlying idea is similar, our current implementation uses a slightly dierent formulation. We replace the smallest P singular values of R by larger values. Speci®cally, given a threshold parameter x > 0 and denote r~ sj1 rj , the r are replaced by x~ r. singular values ri such that ri < x~ Hence, the matrix B is perturbed as

~ T; B~ U RV where ~ diagr1 ; r2 ; . . . ; rq ; x~ r; . . . ; x~ r: R The inverse matrix Bÿ1 is correspondingly replaced by ~ ÿ1 U T ; B~ÿ1 V R

5

where ~ ÿ1 diagrÿ1 ; rÿ1 ; . . . ; rÿ1 ; 1=x~ r; . . . ; 1=x~ r: R 1 2 q Note that the 2-norm condition number of the matrix B is j2 B kBk2 kBÿ1 k2 r1 rÿ1 s ; while the 2-norm condition number of B~ is ~ kBk ~ kB~ÿ1 k r1 =x~ r < xÿ1 : j2 B 2 2 Thus the 2-norm condition number of B~ is no worse than xÿ1 . However, the perturbation from B to B~ introduces errors in the factorization. The 2-norm of the error resulting from this perturbation is governed by the threshold parameter x and the sum of the singular values. The following property is easy to prove. Proposition 3.1. Let B be a square matrix with the singular value decomposition (4) and B~ÿ1 be a square matrix with the singular value decomposition (5). Then I ÿ BB~ÿ1 is of rank n ÿ q and r; 0 if x 6 rs =~ ÿ1 ~ kI ÿ BB k2 r otherwise: 1 ÿ rs =x~ r will yield the exact It may be dicult to select an appropriate x. Note that any x such that 0 6 x 6 rs =~ inverse. A large value of x will result in an inaccurate approximate inverse. In our experiments the use of the SVD regularized inverse enabled us to solve some problems that were otherwise very hard to solve, and to stabilize iteration process in certain cases, as will be seen in the next section.

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

829

Preconditioning process. The preconditioning process (application of BILUM) consists of a block forward elimination step and a block backward substitution step [20,19]. At each level j, we partition the vector xj as xj

yj xj1

6

corresponding to the two by two block matrix (2) and perform the following steps: Algorithm 3.1 (Application of BILUM preconditioner). 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Copy the right-hand side vector b to x0 . For j 0; 1; . . . ; L ÿ 1, do forward elimination sweep: Apply permutation Pj to xj to partition it in the form of (6). xj1 : ÿEj Dÿ1 j yj xj1 . End do. Solve with a relative tolerance : AL xL : xL . For j L ÿ 1; . . . ; 1; 0, do backward substitution sweep: yj : Dÿ1 j ÿFj xj1 yj . Apply inverse permutation PjT to the solution yj . End do.

The solution on the last level may not need to be exact. In [20], the coarsest level solution is obtained by applying several iterations of a preconditioned GMRES. The drawback of seeking an iterative solution on the coarsest level is that the preconditioner is changing at each iteration. This feature requires that the iterative accelerator be able to accommodate a variable preconditioner. Such an accelerator is available as ¯exible variants of GMRES, such as FGMRES [15] and GMRESR [27]. However, it seems that there are no similar variants of BiCGSTAB and TFQMR available. On the other hand, seeking an exact solution on the coarsest level is neither necessary nor computationally cheap. We decided to compromise by using a high accuracy ILUT(s; p) preconditioner [16] as an approximate solver on the coarsest level, i.e., AL is approximately factored as LL UL and the Lines 6 and 7 in Algorithm 3.1 are replaced by 7. Solve LL UL xL xL . In this way, although we still have an approximate solution on the coarsest level, the preconditioner is the same at all iterations and all three iterative accelerators can be accommodated. The resulting preconditioner is referred to as modified BILUM. However, in the rest of this paper, when we mention BILUM, we actually mean the modi®ed BILUM. For convenience, we give a complete algorithm for constructing the BILUM preconditioner: Algorithm 3.2 (Construction of BILUM preconditioner). 1. Given the block size b and parameters s; p; x; L. 2. For j 0; 1; . . . ; L ÿ 1, do: 3. Find Pj by a block independent set ordering with blocks of uniform size b. 4. Perform matrix permutation A~j PjT Aj Pj . 5. Extract submatrices Dj ; Fj ; Ej and Cj from A~j . 6. Compute Dÿ1 j using SVD regularized inverse technique with parameter x. 7. Compute the approximate Schur complement Aj1 Cj ÿ Ej Dÿ1 j Fj with double dropping parameters p; s. 8. Store submatrices Dÿ1 j ; Fj and Ej . 9. End do. 10. Compute LL UL for AL using ILUT(p; s).

830

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

Storage cost. On each level j, we only store a sparse matrix of the form ÿ1 Fj Dj for j 0; 1; . . . ; L ÿ 1: Ej and the Schur complement Aj are not stored. The action of Ej Dÿ1 is achieved by The submatrix Ej Dÿ1 j j ÿ1 applying Dj followed by Ej . The approximate Schur complement Aj is computed, used, and discarded. If we assume that the matrix A is presparsi®ed so that the number of elements of each row does not exceed p and the double dropping strategy is used to form the approximate Schur complement, we then have the following upper bound on the storage cost of the modi®ed BILUM. Proposition 3.2. Let mj be the size of the independent set on level j and b be the uniform block size. Let the number of nonzeros of each row of all Aj ; j 0; 1; . . . ; L not exceed p. Suppose each row of the L and U factors of the last ILUT has less than 2p nonzeros, then the number of nonzeros of the modified BILUM with L levels of reductions is bounded by p bn 3p ÿ bmL p

L X

jmj :

7

j1

Proof. Denote by rj the size of the reduced system on level j. On each level, the maximum number of nonzeros of Dj is bmj , those of Fj and Ej are pmj and prj , respectively. By construction, the last level ILUT preconditioner has the maximum number of nonzeros bounded by 4pmL . Hence, the number of total nonzeros of BILUM is bounded by L ÿ1 X

p bmj prj 4pmL p b

j0

L X

mj 3p ÿ bmL p

j1

L ÿ1 X

rj :

8

j0

Note that the nodes in all independent sets, including the last reduced system, constitute the order of A. It follows that L X

mj n:

9

j0

Furthermore, we have mj mj1 rj1 , for j 0; 1; . . . ; L ÿ 1; and rLÿ1 mL , rL 0. It is easy to see Lÿ1 X j0

rj

L X

jmj :

10

j1

Substituting (9) and (10) into (8) yields the bound (7).

Since n is far larger than mL , the bound (7) shows that the size of the blocks in¯uences the sparsity of the preconditioner heavily. On the other hand, since the last term of (7) does not contain m0 and the factor j grows as the level increases, it is bene®cial to have a large independent set on each level. Unfortunately, a large independent set is usually accompanied by large size blocks. 4. Numerical experiments Standard implementations of BILUM have been described in detail in [19,20] and a software package is freely available to public from the author's web homepage at http://www.cs.uky.edu/jzhang. The modi®ed BILUM is dierent from those reported in [19,20] in the sense that we do not have an inner iteration process in Lines 6 and 7 of Algorithm 3.1 to solve the last reduced system. Thus the preconditioner is a ®xed approximate solution process. Unless otherwise indicated explicitly, we used these default

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

831

Table 1 Description of the test matrices from CFD applications Matrix

Order

RAEFSKY1 RAEFSKY2 RAEFSKY3 RAEFSKY4 RAEFSKY5 RAEFSKY6 UTM3060 UTM5940 VENKAT01 VENKAT25 VENKAT50 WIGTO966

3242 3242 21 200 19 779 6 316 3 402 3 060 5 940 62 424 62 424 62 424 3864

Nonzeros 294 276 294 276 1 488 768 1 328 611 168 658 137 845 42 211 83 842 1 717 792 1 717 792 1 717 792 238 252

Description Flow in pressure driven pipe, time 05 Flow in pressure driven pipe, time 25 Fluid structure interaction turbulence Buckling problem for container model Landing hydrofoil airplane FSE model Slosh tank model Nuclear fusion plasma simulation Nuclear fusion plasma simulation Unstructured 2D Euler solver, time 0 Unstructured 2D Euler solver, time 25 Unstructured 2D Euler solver, time 50 Euler equation model

parameters for our preconditioned iterative solvers: GMRES with a restart value of 20, BiCGSTAB and TFQMR, were used as the accelerators; the maximum number of reductions allowed was 10; only the double dropping strategy was used to control the sparsity of the preconditioner. In all cases, BILUM was used as the right preconditioner. All matrices were considered general sparse and nonsymmetric. The right-hand side of the linear systems was generated by assuming that the solution is a vector of all ones and the initial guess was a vector of some random numbers. The computations were terminated when the 2-norm of the residual was reduced by a factor of 107 . We also set an upper bound of 300 for the number of preconditioned iteration, which is de®ned as one preconditioned matrix±vector product involving both the coecient matrix and the preconditioner, i.e., one application of both A and BILUM. The numerical experiments were conducted on a Silicon Graphics workstation using Fortran 77 programming language. In all tables with numerical results, ``b'' is the size of the uniform blocks. ``s'' and ``p'' are parameters used in the double dropping strategy. ``x'' is the SVD threshold value (see Algorithm 3.2). ``s'' shows the sparsity ratio which is the ratio between the number of nonzeros of the preconditioner to that of the original matrix [19]. The symbol ``±'' indicates a lack of convergence. The last reduced system is approximately factored by ILUT(0:1s; 2p) [16]. Due to dierent nature of the Krylov subspace methods, we only count the number of preconditioned matrix±vector products (PMVP). The major cost of the preconditioned iterative solvers with a high accuracy preconditioner is the cost of the PMVP with both the coecient matrix and the preconditioner. For comparison purpose, the number of PMVP gives a fairly accurate description of the overall cost of the preconditioned iterative solvers with the same preconditioner. When the accuracy of the preconditioners is dierent, the iteration count and the sparsity ratio together give the full description of the overall cost. Thus, the overall cost may be formulated as s 1 PMVP. All matrices described in Table 1 (except WIGTO966) are available online from the University of Florida Sparse Matrix Collection. 2 The WIGTO966 matrix may be obtained from the author. RAEFSKY matrices. The RAEFSKY matrices were provided by H. Simon from Lawrence Berkeley National Laboratory and were originally generated by A. Raefsky from Centric Engineering. Test results for these matrices are given in Table 2. The ®rst three RAEFSKY matrices are nice to the iterative solvers. Although the PMVP numbers and the sparsity ratios are relatively large for the ®rst three matrices, compared with those for the last three, the convergence behaviors of the iterative solvers were not critically sensitive to the choice of the parameters, such as the size of the blocks and the amount of ®ll-in. In particular, better performances were found with large block sizes and high sparsity ratios, which are consistent

2

The URL address is http://www.cise.u¯.edu/davis/sparse.

RAEFSKY6

RAEFSKY5

RAEFSKY4

RAEFSKY3

s 10 ; p 1; b 1 12

ÿ4

s 10 ; p 5; b 1 10

ÿ4

s 10 ; p 5; b 10 110

ÿ4

s 10 ; p 30; b 80 172

ÿ3

s 10 ; p 20; b 50 164

9

5

41

±

±

RAEFSKY1

RAEFSKY2

±

s 10ÿ3 ; p 20; b 50 166

ÿ3

GMRES

BCGTB

Matrix

±

12

10

146

180

210

TFQMR

0.1

0.3

0.5

1.5

1.1

1.1

s

Table 2 Number of preconditioned matrix±vector products for solving the RAEFSKY matrices

ÿ5

s 10 ; p 5; b 20 12

ÿ4

s 10 ; p 6; b 30 3

ÿ4

s 10 ; p 50; b 100; x 1:0 ±

ÿ4

s 10 ; p 50; b 100 6

ÿ4

s 10 ; p 50; b 100 64

ÿ4

s 10ÿ4 ; p 50; b 100 52

BCGTB

±

6

3

6

64

40

GMRES

3 12

±

6

68

58

TFQMR

0.5

0.7

±

1.9

1.9

1.9

s

832 J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

833

and desirable. We note that, although the results are different with low sparsity ratios, the preconditioned iterative methods yielded almost the same results with high sparsity ratios. For the last three RAEFSKY matrices, especially for RAEFSKY4, the choice of parameters was critical and large size blocks were sometimes troublesome. Due to the structures of the matrices, some of the large size blocks were near-singular and the resulting preconditioners were unstable. Note that the ®ll-in parameter p is small for these matrices and large values of p yielded unstable preconditioners. Fig. 1 shows the nonzero patterns of the RAEFSKY1 (left) and RAEFSKY6 (right) matrices. It can be seen that RAEFSKY1 looks structurally symmetric, but RAEFSKY6 is badly unstructured. Fig. 2 depicts convergence behavior of the preconditioned solvers for solving RAEFSKY4 with large size blocks (b 100) and a large amount of ®ll-in (p 50). We see that the convergence behaviors of BiCGSTAB and GMRES with standard BILUM were irregular, while that of TFQMR was smooth, but stagnated. The enhanced BILUM with SVD regularized inverse was able to stabilize the convergence behavior, but none of the solvers was able to converge within 300 iterations. TOKAMAK matrices. The TOKAMAK matrices were provided by P. Brown of Lawrence Livermore National Laboratory. We tested the largest two of this collection. For UTM3060 with a low sparsity ratio

Fig. 1. Sparsity pattern of the RAEFSKY1 and RAEFSKY6 matrices.

Fig. 2. Convergence history for solving RAEFSKY4 with standard and enhanced BILUM and s 10ÿ4 ; p 50; b 100; s 2:2.

834

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

3:0, we used s 10ÿ4 ; p 15; b 30. The convergence history is depicted in the left part of Fig. 3. We see that both BiCGSTAB and TFQMR converged faster than GMRES did. On the other hand, the right part of Fig. 3 shows that with a high sparsity ratio of 5.4 (with s 10ÿ4 ; p 30; b 60), GMRES converged more quickly than both BiCGSTAB and TFQMR did. It is interesting to point out that the convergence behaviors of BiCGSTAB and TFQMR were similar, but the convergence behavior of TFQMR was more smooth. Similar tests were performed with the UTM5940 matrix. The left part of Fig. 4 shows the convergence results with a low sparsity ratio preconditioner (s 10ÿ4 ; p 30; b 40; s 4:1) and the right part shows those with a high sparsity ratio one (s 10ÿ4 ; p 60; b 80; s 7:3). Once again, BiCGSTAB was fastest and GMRES was slowest with low sparsity ratio. With a high sparsity ratio, GMRES converged fastest, while TFQMR slowest. The convergence behavior of GMRES was smooth and that of BiCGSTAB demonstrated abrupt jumps and downs.

Fig. 3. Convergence history for solving UTM3060 with dierent sparsity ratios.

Fig. 4. Convergence history for solving UTM5940 with dierent sparsity ratios.

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

835

Table 3 Number of preconditioned matrix±vector products for solving the VENKAT matrices Matrix

BCGTB

GMRES

VENKAT01 VENKAT25 VENKAT50

s 10ÿ3 ; p 20; b 50 s 2:3 14 13 170 150 ± 268

TFQMR

BCGTB

GMRES

TFQMR

14 174 222

s 10ÿ4 ; p 50; b 100 s 4:0 8 7 8 50 50 50 48 48 48

Table 4 Number of preconditioned matrix±vector products for solving the WIGTO966 matrix BCGTB

GMRES

s 10ÿ4 ; p 40; b 50 288 ±

TFQMR 222

s

BCGTB

1.6

s 10ÿ4 ; p 40; b 100 172 208

ÿ4

s 10 ; p 50; b 100 30 29

TFQMR

s

188

2.2

90

2.5

70

3.0

ÿ4

34

2.3

ÿ4

s 10 ; p 60; b 120 20 16

GMRES

s 10 ; p 60; b 100 96 76 ÿ4

20

2.6

s 10 ; p 70; b 120 76 56

VENKAT matrices. The VENKAT matrices were supplied by V. Venkatakrishnan from NASA Ames Research Center. The dif®culty of the matrices increases as the time evolves. The convergence results are listed in Table 3. With a low accuracy BILUM, there are some differences in performance as the dif®culty of the matrices increases. GMRES and TFQMR were slightly better than BiCGSTAB. In fact, we also tested another set of parameters as s 10ÿ3 ; p 10; b 50 (s 1:9) for VENKAT01, only GMRES converged in 13 iterations. However, the important observation is that, with a high accuracy BILUM, the convergence results of the three preconditioned solvers are (almost) identical. These are good examples to support our assertion that it is the quality of the preconditioner that plays the critical role in a preconditioned iterative solver. WIGTO966 matrix. The WIGTO966 matrix was supplied by L. Wigton from Boeing Commercial Airplane Group. Despite its moderate size, this matrix has been solved by threshold based ILUT(s; p) only with large values of p and high sparsity ratios [4]. For this reason, it has been used by several authors to test new alternative preconditioners [19,23]. We chose several sets of parameters and report the results in Table 4. It can be seen that, large size blocks are bene®cial and the performance of the three solvers are comparable. With a high accuracy BILUM, the performance of the accelerators, especially GMRES, was enhanced substantially. The sparsity ratios are low compared with that would be required by ILUT. Typically, BILUM needs only about a quarter of the memory space required by ILUT [23] and is often several times faster. FIDAP matrices. The FIDAP matrices 3 were extracted from the test problems provided in the FIDAP package [7]. They were generated by I. Hasbani of Fluid Dynamics International and B. Rackner of Minnesota Supercomputer Center. The matrices were resulted from modeling the incompressible Navier± Stokes equations and were generated using whatever solution method was speci®ed in the input decks. However, if the penalty method was used, there is usually a corresponding FIDAPM matrix, which was constructed using a fully coupled solution method (mixed u±p formulation). The penalty method gives very illconditioned matrices, whereas mixed u±p method gives inde®nite, larger systems (they include pressure variables).

3 All FIDAP matrices are available from the MatrixMarket of the National Institute of Standards and Technology at http:// math.nist.gov/MatrixMarket.

836

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

Table 5 Description of the FIDAP matrices Matrix

Order

Nonzeros

Description

FIDAP012 FIDAP013 FIDAP015 FIDAP018 FIDAP020 FIDAP024 FIDAP028 FIDAP029 FIDAP031 FIDAP035 FIDAP037 FIDAP040 FIDAPM03 FIDAPM07 FIDAPM08 FIDAPM11 FIDAPM13 FIDAPM15 FIDAPM33

3973 2568 6867 5773 2203 2283 2603 2870 3909 19 716 3565 7740 2532 2065 3876 22 294 3549 9287 2353

80 151 75 628 96 421 69 335 69 579 48 733 77 653 23 754 115 299 218 308 67 591 456 226 50 380 53 533 103 076 623 554 71 975 98 519 23 765

Flow in lid-driven wedge (28.5 degree angle) Axisymmetric poppet valve ( gap 0.01 in, dp 500 psi) Spin up of a liquid in an annulus 2D turbulent ¯ow over a backward-facing step Attenuation of a surface disturbance Unsymmetric forward roll coating Two merging liquids with one external interior interface Turbulent ¯ow in axisymmetric U-bend Dilute species deposition on a tilted heated plate turbulent ¯ow in a heated channel (Re 25960) Flow of plastic in a pro®le extrusion die 3D die-swell (square die Re 1, Ca 1) Flow past a cylinder in free stream (Re 40) Natural convection in a square enclosure (Ra 1000) Developing ¯ow, vertical channel (angle 0, Ra 1000) 3D steady ¯ow, cylinder and ¯at plate head exchanger Axisymmetric poppet valve ( gap 0.01 in, dp 500) Spin up of a liquid in an annulus Radiation heat transfer in a square cavity

Several of these matrices contain small or zero diagonal values [6]. 4 The zero diagonals are due to the incompressibility condition of the Navier±Stokes equations [6]. The substantial amount of zero diagonals makes these matrices inde®nite. It is remarked in [4] that the FIDAP matrices are dicult to solve with ILU preconditioning techniques, which require high level of ®ll-in to be eective and the performance of the preconditioners is unstable with respect to the amount of ®ll-in. Many of them cannot be solved by the standard BILUM preconditioner and in some cases, even the construction of BILUM failed due to the occurrence of very illconditioned blocks. Nevertheless, some of them may be solved by the enhanced version of BILUM using singular value decomposition based regularized inverse technique and variable block sizes [23]. The details of several largest FIDAP matrices are listed in Table 5 and the corresponding results are given in Table 6, except for FIDAP035. All of them were solved by the SVD enhanced BILUM preconditioner. Although there are exceptions, high accuracy BILUM in general yielded better convergence rate. The exceptions may be attributed to the diculty in choosing an appropriate SVD threshold parameter x. Not only dierent problems may have dierent optimal x, the same problem with dierent block sizes may also have dierent optimal x. Our results were obtained by trials and errors, and we do not claim the listed parameters are optimal. The choice of the SVD parameter introduces inconvenience in the application of BILUM, but the SVD technique enhanced the robustness of the standard BILUM remarkably. Most of the problems could not be solved by the standard BILUM with the given sparsity ratios. The test results also show that the new formula for the SVD regularized inverse worked ®ne. There is no fundamental dierence between dierent formulas, but the current one seems to allow the optimal SVD threshold parameter x to be restricted to a more narrow region, which may slightly reduce the diculty in choosing a suitable parameter. Fig. 5 shows the convergence behaviors of the preconditioned iterative methods for solving the FIDAP035 matrix with dierent sparsity ratios. This high Reynolds number ¯ow problem seems hard to solve by our solvers. Two runs with dierent accuracy failed to converge, although high accuracy BILUM did improve the convergence rate.

4 The FIDAP matrices have structural zeros added on the o-diagonals to make them structurally symmetric. Structural zeros were also added to the diagonals.

FIDAPM33

FIDAPM15

FIDAPM13

FIDAPM11

FIDAPM08

FIDAPM07

FIDAPM03

FIDAP040

FIDAP037

FIDAP031

FIDAP029

FIDAP028

FIDAP024

FIDAP020

FIDAP018

FIDAP015

FIDAP013

FIDAP012

Matrix

s 10ÿ4 ; 132 s 10ÿ4 ; ± s 10ÿ4 ; ± s 10ÿ4 ; 153 s 10ÿ4 ; 98 s 10ÿ4 ; 68 s 10ÿ4 ; 96 s 10ÿ4 ; 4 s 10ÿ4 ; 144 s 10ÿ4 ; 72 s 10ÿ5 ; 255 s 10ÿ4 ; 158 s 10ÿ4 ; 200 s 10ÿ4 ; 146 s 10ÿ4 ; 162 s 10ÿ4 ; ± s 10ÿ4 ; 62 s 10ÿ4 ; 184

BCGTB

p 20; b 30; x 10ÿ4

p 50; b 30; x 10ÿ5

p 50; b 100; x 10ÿ5

p 20; b 30; x 10ÿ2

p 20; b 30; x 10ÿ4

p 90; b 60; x 10ÿ4

p 20; b 30; x 10ÿ2

p 70; b 100; x 10ÿ4

p 20; b 30; x 10ÿ4

p 20; b 20; x 10ÿ3

p 10; b 30; x 10ÿ4

p 30; b 50; x 10ÿ6

p 20; b 30; x 10ÿ3

p 20; b 30; x 10ÿ5

p 30; b 90; x 10ÿ2

p 40; b 100; x 10ÿ3

p 50; b 110; x 10ÿ4

p 30; b 50; x 10ÿ5

62

284

62

283

±

210

±

293

284

62

237

3

106

150

±

193

17

119

GMRES

±

72

154

170

134

220

212

192

74

154

4

112

86

122

±

±

±

132

TFQMR

Table 6 Number of preconditioned matrix±vector products for solving the FIDAP matrices

4.2

5.5

5.9

2.6

2.0

4.7

2.6

2.9

2.6

1.5

3.9

2.5

2.2

1.7

8.0

7.8

4.3

3.6

s s 10ÿ4 ; 116 s 10ÿ4 ; 160 s 10ÿ4 ; 257 s 10ÿ4 ; 176 s 10ÿ4 ; 28 s 10ÿ4 ; 20 s 10ÿ4 ; 16 s 10ÿ4 ; 18 s 10ÿ4 ; 14 s 10ÿ4 ; 12 s 10ÿ6 ; 236 s 10ÿ4 ; 46 s 10ÿ4 ; ± s 10ÿ4 ; 20 s 10ÿ4 ; 16 s 10ÿ5 ; 76 s 10ÿ4 ; 46 s 10ÿ4 ; 42

BCGTB

p 50; b 100; x 10ÿ6

p 50; b 100; x 10ÿ5

p 70; b 100; x 10ÿ5

p 50; b 100; x 10ÿ5

p 50; b 100; x 10ÿ5

p 50; b 100; x 10ÿ3

p 50; b 100; x 10ÿ3

p 70; b 100; x 10ÿ4

p 50; b 100; x 10ÿ6

p 50; b 100; x 10ÿ5

p 50; b 100; x 10ÿ3

p 50; b 100; x 10ÿ5

p 50; b 80; x 10ÿ5

p 50; b 50; x 10ÿ5

p 50; b 100; x 10ÿ2

p 40; b 120; x 10ÿ2

p 40; b 140; x 10ÿ4

p 50; b 100; x 10ÿ5

12

12

14

12

17

18

±

37

38

64

14

19

39

277

±

±

40

83

GMRES

±

50

50

90

18

24

46

178

18

16

14

16

25

76

202

±

±

80

TFQMR

4.1

4.6

2.7

8.8

8.9

5.1

6.0

6.2

6.5

4.7

10.8

10.5

±

6.0

2.9

6.5

4.2

11.7

s

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840 837

838

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

Fig. 5. Convergence history for solving FIDAP035 with dierent sparsity ratios. Left: s 10ÿ4 ; p 30; b 30; x 0:1. Right: s 10ÿ3 ; p 50; b 100; x 0:01.

Table 7 Number of matrix±vector products without preconditioning Matrix

BCGTB

GMRES

TFQMR

Matrix

BCGTB

GMRES

TFQMR

RAEFSKY4 FIDAP029

± 154

± 86

280 94

RAEFSKY5 FIDAP037

168 98

51 108

74 82

Without preconditioning. Finally, it is worth pointing out that, for 27 out of the 31 matrices tested previously, none of the three Krylov subspace methods without a preconditioner converged within 300 matrix± vector products (MVP). Table 7 lists the number of MVPs for the four matrices that were solved by the unpreconditioned iterative solvers. Our test results show that these Krylov subspace methods are much less ef®cient and in most cases are useless for solving the given problems without preconditioning. 5. Concluding remarks Through numerical tests, we have studied three popular Krylov subspace methods: BiCGSTAB, GMRES, and TFQMR, preconditioned by a powerful general purpose multilevel block ILU factorization preconditioner (BILUM). Our test problems include several collections of nonsymmetric matrices arising from realistic CFD applications. Both the standard and the enhanced versions of BILUM preconditioner were found to be useful in improving the robustness and eciency of the Krylov subspace methods. In addition, we also outlined an improved variant of BILUM with more ecient storage strategy and ®xed coarsest level (approximate) solver to allow the use of BiCGSTAB and TFQMR as the accelerator. An upper bound for the sparsity amount of the modi®ed BILUM was given. We implemented a new formulation of the SVD regularized inverse technique with BILUM. Our ®ndings reinforce the current consensus among preconditioning technique practitioners that the quality of the preconditioner is more critical than the choice of the Krylov subspace accelerator in designing a preconditioned iterative solver for large scale CFD applications. It is hard to determine which Krylov subspace method is more ecient for solving nonsymmetric linear systems arising from CFD applications, if only a low accuracy preconditioner is employed. Our experiments show that none of them is an absolute winner for a large group of matrices and all of them may fail for certain dicult real life problems. On the

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

839

other hand, if a high accuracy preconditioner is used, the performances of the Krylov subspace methods are no much dierence. In fact, in several cases, they converged in exactly the same number of iterations. Based on our experiments and experience, we suggest that more attention be given to the development and identi®cation of high accuracy preconditioners that are robust and ecient in large scale CFD and other applications. We also see a potential advantage of the transpose-free Krylov subspace methods. Since the preconditioners play a critical role in the preconditioned iterative solver and usually represent the major portion of the memory cost, it may be more pro®table to use transpose-free methods, and to allocate the storage space that would be required for storing the transpose of the matrix to allow a more accurate preconditioner to be computed. Although most iterative methods are parameter-free, most high accuracy preconditioners are not. Modern preconditioners commonly need some carefully chosen parameters in order to yield best performance. These parameters are usually problem dependent. This feature of the modern preconditioned iterative methods, although causes some inconvenience to the end users, also allows the ecient use of available resources, e.g., the available memory space on a given machine. The multilevel block ILU preconditioner discussed in this paper and several of its variants [22,21,24,28,29] allow some ¯exibility to control memory cost. They also have a high degree of inherent parallelism that can be exploited in parallel or distributed computing environments [19].

References [1] R. Barrett, M. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, H. van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA, 1993. [2] M. Benzi, M. Tuma, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput. 19 (3) (1998) 968±994. [3] R. Bru, D. Ginestar, T. Manteuel, J. Marin, G. Verdu, Iterative schemes for the neutron diusion equation, in: Preliminary Proceedings of the 1998 Copper Mountain Conference on Iterative Methods, Copper Mountain, CO, March 30 ± April 3, 1998. [4] A. Chapman, Y. Saad, L. Wigton, High-order ILU preconditioners for CFD problems, Technical Report UMSI 96/14, Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN, 1996. [5] E. Chow, Y. Saad, Approximate inverse techniques for block-partitioned matrices, SIAM J. Sci. Comput. 18 (1997) 1657±1675. [6] E. Chow, Y. Saad, Experimental study of ILU preconditioners for inde®nite matrices, J. Comput. Appl. Math. 86 (2) (1997) 387±414. [7] M. Engelman, FIDAP: examples manual, Revision 6.0, Technical Report, Fluid Dynamics International, Evanston, IL, 1991. [8] R.W. Freund, A transpose-free quasi-minimum residual algorithm for non-Hermitian linear systems, SIAM J. Sci. Comput. 14 (1993) 470±482. [9] G.H. Golub, H.A. van der Vorst, Closer to the solution: iterative linear solvers, in: I.S. Du, G.A. Watson (Eds.), The State of the Art in Numerical Analysis, Clarendon Press, Oxford, 1997, pp. 63±92. [10] G.H. Golub, C.F. van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 1983. [11] N.I.M. Gould, J.A. Scott, Sparse approximate-inverse preconditioners using norm-minimization techniques, SIAM J. Sci. Comput. 19 (2) (1998) 605±625. [12] M.R. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards 49 (1952) 409±436. [13] N.M. Nachtigal, S.C. Reddy, L.N. Trefethen, How fast are nonsymmetric matrix iterations?, SIAM Matrix Anal. Appl. 13 (3) (1992) 778±795. [14] G. Radicati, Y. Robert, S. Succi, Iterative algorithms for the solution of nonsymmetric systems in the modelling of weak plasma turbulence, J. Comput. Phys. 80 (1989) 489±497. [15] Y. Saad, A ¯exible inner±outer preconditioned GMRES algorithm, SIAM J. Sci. Stat. Comput. 14 (2) (1993) 461±469. [16] Y. Saad, ILUT: a dual threshold incomplete LU preconditioner, Numer. Linear Algebra Appl. 1 (4) (1994) 387±402. [17] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS, New York, 1996. [18] Y. Saad, M.H. Schultz, GMRES: a generalized minimal residual method for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 7 (3) (1986) 856±869. [19] Y. Saad, M. Sosonkina, J. Zhang, Domain decomposition and multi-level type techniques for general sparse linear systems, in: J. Mandel, C. Farhat, X.-C. Cai (Eds.), Domain Decomposition Methods 10, number 218 in Contemporary Mathematics, AMS, Providence, RI, 1998, pp. 174±190. [20] Y. Saad, J. Zhang, BILUM: block versions of multielimination and multilevel ILU preconditioner for general sparse linear systems, SIAM J. Sci. Comput. 20 (6) (1999) 2103±2121. [21] Y. Saad, J. Zhang, BILUTM: a domain-based multi-level block ILUT preconditioner for general sparse matrices, SIAM J. Matrix Anal. Appl. (to appear). [22] Y. Saad, J. Zhang, Diagonal threshold techniques in robust multi-level ILU preconditioners for general sparse linear systems, Numer. Linear Algebra Appl. (to appear).

840

J. Zhang / Comput. Methods Appl. Mech. Engrg. 189 (2000) 825±840

[23] Y. Saad, J. Zhang, Enhanced multi-level block ILU preconditioning strategies for general sparse linear systems, Technical Report UMSI 98/98, Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN, 1998. [24] Y. Saad, J. Zhang, A multi-level preconditioner with applications to the numerical simulation of coating problems, in: D.R. Kincaid et al. (Eds.), Iterative Methods in Scienti®c Computing II, IMACS, New Brunswick, NJ, 1999 (to appear). [25] J.N. Shadid, R.S. Tuminaro, A comparison of preconditioned nonsymmetric Krylov methods on a large-scale MIMD machine, SIAM J. Sci. Comput. 15 (2) (1994) 440±459. [26] H. van der Vorst, Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 13 (1992) 631±644. [27] H.A. van der Vorst, C. Vuik, GMRESR: a family of nested GMRES methods, Numer. Linear Algebra Appl. 1 (4) (1994) 369±386. [28] J. Zhang, A grid based multilevel incomplete LU factorization preconditioning technique for general sparse matrices, Technical Report No. 283-99, Department of Computer Science, University of Kentucky, Lexington, KY, 1999. [29] J. Zhang, RILUM: a general framework for robust multilevel recursive incomplete LU preconditioning techniques, Technical Report No. 284-99, Department of Computer Science, University of Kentucky, Lexington, KY, 1999.

Preconditioned Krylov subspace methods for solving nonsymmetric matrices from CFD applications

Preconditioned Krylov subspace methods for solving nonsymmetric matrices from CFD applications

Recommend Documents