A parallel algorithm to approximate inverse factors of a matrix via sparse–sparse iterations

A parallel algorithm to approximate inverse factors of a matrix via sparse–sparse iterations

Applied Mathematics and Computation 181 (2006) 782–792 www.elsevier.com/locate/amc A parallel algorithm to approximate inverse factors of a matrix vi...

407KB Sizes 0 Downloads 133 Views

Applied Mathematics and Computation 181 (2006) 782–792 www.elsevier.com/locate/amc

A parallel algorithm to approximate inverse factors of a matrix via sparse–sparse iterations Davod Khojasteh Salkuyeh

a,*

, Saeed Karimi b, Faezeh Toutounian

c

a

c

Department of Mathematics, Mohaghegh Ardabili University, P.O. Box 56199-11367, Ardabil, Iran b Department of Mathematics, Persian Gulf University, Boushehr, Iran School of Mathematical Sciences, Ferdowsi University of Mashhad, P.O. Box 1159-91775, Mashhad, Iran

Abstract In [D.K. Salkuyeh, F. Toutounian, A block version algorithm to approximate inverse factors, Appl. Math. Comput., 162 (2005) 1499–1509], the authors proposed the BAIB algorithm to approximate inverse factors of a matrix. In this paper a parallel version of the BAIB algorithm is presented. In this method the BAIB algorithm is combined with computing the sparse approximate solution of a sparse linear system by sparse–sparse iterations. The new method does not require that the sparsity pattern be known in advance. Some numerical experiments on test matrices from Harwell–Boeing collection are presented to show the efficiency of the new method and comparing to the AIB algorithm.  2006 Elsevier Inc. All rights reserved. Keywords: Inverse factors; Preconditioning; Krylov subspace methods; Sparse matrices; AIB algorithm; BAIB algorithm; Parallel

1. Introduction The problem of finding the solution vector x of a linear system of equations Ax ¼ b;

ð1Þ

nn

n

where A 2 R is large, sparse and x, b 2 R , arises in many application areas. Iterative methods which combine preconditioning techniques are among the most efficient techniques for solving (1). More precisely, iterative methods usually involve a second matrix that transforms the coefficient matrix into one with a more favorable spectrum. The transformation matrix is called a preconditioner. Without a preconditioner, an iterative method may have a poor convergence or even fail to converge. Approximate inverse techniques can be mainly divided into two categories. In the first category, a nonsingular matrix M is computed such that M  A1 in some sense. In this case the transformed linear system AMy ¼ b;

*

x ¼ My;

or MAx ¼ Mb

ð2Þ

Corresponding author. E-mail addresses: [email protected] (D.K. Salkuyeh), [email protected] (S. Karimi), [email protected] (F. Toutounian).

0096-3003/$ - see front matter  2006 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2006.02.006

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

783

will have the same solution as system (1), but may be easier to solve. Systems (2) are called the right- and leftpreconditioned systems, respectively. The basic idea to compute a right sparse approximate inverse is to minimize the Frobenius norm of the residual matrix I  AM. We have n X F ðMÞ ¼ kI  AMk2F ¼ kej  Amj k22 ; ð3Þ j¼1

where ej denotes the jth column of the identity matrix. Hence minimizing (3) is equivalent to minimizing the independent functions 2

fj ðmÞ ¼ kej  Amk2 ;

j ¼ 1; . . . ; n.

ð4Þ

If the sparsity pattern of M is prescribed then (4) is reduced to solving n small least squares problems. A left sparse approximate inverse can be obtained by computing a right sparse approximate inverse of AT and then taking its transpose. One of the methods which is based on the Frobenius norm minimization is the SPAI algorithm proposed by Grote and Huckle in [12]. In this method the initial structure of M is selected to be diagonal and then used a procedure to improve the minimum by updating the sparsity pattern of M. A similar method is proposed by Gould and Scott in [11]. Chow and Saad [5] use a few iterations of an iterative method such as the minimal residual iteration to minimizing the functions in (4). Their method does not need the sparsity pattern of M be known in advance. The sparsity pattern of M is maintained by a dropping rule. Now, let us assume that A has the LU factorization and M ¼ MU ML

where M U  U 1

and

M L  L1 ;

ð5Þ

where L and U are the lower and upper triangular factors of A. This type of preconditioners are known as factored sparse approximate inverses and MU and ML are called sparse approximate inverse factors of A. Here, the transformed linear system can be considered as follows: M L AM U y ¼ M L b;

x ¼ M U y.

ð6Þ

System (6) is called split-preconditioned system. In practice we do not need to compute AM(MA) explicitly, because when the Krylov subspace methods such as CG [18], GMRES [19], CGS [21] or Bi-CGSTAB [22] are used to solve (2) or (6), only the matrix–vector product My is required and it can be computed in parallel. Here, we focus our attention on computation of sparse approximate inverse factors of a matrix. There are different ways to compute sparse approximate inverse factors of a matrix and each of them has its own advantages and disadvantages. In [4] the AINV method was proposed which is based on an algorithm which comn n putes two sets of vectors fzi gi¼1 and fwi gi¼1 which are A-biconjugate, i.e., such that wTi Azj ¼ 0 if and only if i 5 j. A similar method was proposed in [23] by Zhang. Another approach to compute sparse approximate inverse factors of a matrix is the AIB algorithm which is based on a bordering technique [2,4,18]. A block version of the AIB algorithm which is referred to as BAIB algorithm can be found in [13]. All these methods do not require that the sparsity pattern be known in advance but are inherently sequential. Another approach which was proposed by Kolotilina and Yeremin is the FSAI algorithm [16,17]. They assume that A is symmetric positive definite (SPD) and then construct factorized sparse approximate inverse preconditioners which are also SPD. Each factor implicitly approximates the inverse of the lower triangular Cholesky factor of A. This method can easily be extended to the nonsymmetric case. The FSAI algorithm is inherently parallel but its main disadvantage is the need to prescribe the sparsity of approximate inverse factors in advance. In this paper, to achieve parallelism the BAIB algorithm is combined with approximating the sparse solution of a linear system by sparse–sparse iterations and a parallel version of the BAIB algorithm is proposed. This paper is organized as follows. In Section 2, we review some available methods to approximate sparse solution of a linear system of equations which are based on sparse–sparse iterations. The AIB algorithm is reviewed is Section 3. Section 4 is devoted to the BAIB algorithm and its parallel version. Analysis of the new method and its implementation details are given in Section 5. Some numerical examples are given in Section 6. Finally, we give some concluding remarks in Section 7.

784

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

2. Sparse approximate solution to Ax = b In this section, we review some available techniques to compute sparse approximate solution to Ax = b. When the nonzero pattern of x is fixed in advance, the computation of x can be accomplished as follows. The nonzero pattern of x is a subset J  fij1 6 i 6 ng such that xi = 0 if i 62 J. Now a sparse vector m is computed as the solution of the following constrained minimization problem: min kb  Amk2 ; m2S

ð7Þ

where S is the set of vectors with nonzero pattern J. The vector m can be considered as an sparse approxb ¼ AðJÞ be the submatrix of A with column indices in J and m ^ ¼ mðJÞ. Then imation of the vector x. Let A the nonzero entries of m can be computed by solving the unconstrained least squares problem b mk ^ 2. min kb  A ^ m

ð8Þ

In our context vector b is also very sparse. Hence many rows of this least squares problem are zero and can be deleted. Therefore the least squares problem (8) is extremely small and can be solved by means of the QR facb For more details see [12]. torization of A. In the method described above, the nonzero pattern of x defines its numerical values. Chow and Saad [5,6] use a few steps of an iterative method such as minimal residual (MR) algorithm to compute a sparse approximate solution of Ax = b. Their method automatically generates new entries, to which they apply a dropping strategy to remove the excessive fill-in appearing in x. In the MR variation described by Algorithm 1 below [7], fill-in is introduced once at a time in step 3. Algorithm 1. MR iteration 1. Choose an initial sparse guess x and r :¼ b  Ax 2. While x has fewer than lfil nonzeros Do: 3. Choose d to be r with the same pattern as x; If nnz(x) < lfil then add one entry which is the largest remaining entry in absolute value 4. q :¼ Ad ðr;qÞ 5. a :¼ ðq;qÞ 6. r :¼ r  aq 7. x :¼ x + ad 8. Enddo To keep the iterations economical, all computations are performed with sparse matrix–sparse vector or sparse vector–sparse vector operations. In this algorithm, if we ignore the numerical dropping then we will have the optimal case, i.e., the maximum reduction in the residual norm. Unfortunately the optimality is annihilated because of the numerical dropping, even if A is an SPD matrix. In [15], an algorithm was proposed to compute a sparse approximate solution to the SPD linear system Ax = b. In this algorithm no dropping strategy is needed and sparsity of x is preserved only by specifying its number of nonzero elements, lfil, in advance. In each iteration at most m (m  n) entries are added to the current approximate solution. This algorithm can be written as follows. Algorithm 2. Sparse approximate solution to the SPD linear systems 1. set x :¼ 0 and r :¼ b 2. For i = 1, . . . , nx and if nnz(x) < lfil do 3. Select the indices of m components of largest absolute values in the current residual vector r, i.e., J ¼ fi1 ; i2 ; . . . ; im g  f1; 2; . . . ; ng and E :¼ ½ei1 ; ei2 ; . . . ; eim  4. Solve ETAEy = ETr for y 5. x :¼ x + Ey 6. r :¼ r  AEy 7. Enddo

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

785

In step 3 of this algorithm eij denotes the ij column of the identity matrix. Here we note that the matrix S = ETAE is an SPD matrix of dimension m and is the principal submatrix of matrix A consisting of the rows and columns of A with indices in J. Vector x computed by this algorithm has at most lfil nonzero entries. In practical implementation of the algorithm the number m is usually chosen to be very small, for example m = 1, 2 or 3. The number nx is taken somewhat greater than lfil because it is possible an entry of x improves several times and may contain very few nonzero elements. It has been shown that [15] this algorithm is more effective than Algorithm 1 in the case that matrix A is SPD. In the case that m = 1 the algorithm was investigated in [14]. 3. The AIB algorithm In the AIB algorithm [3,18], a sequence   Ak vk Akþ1 ¼ wk akþ1

ð9Þ

is made in which An = A. If the inverse factors Lk, Uk are available for Ak, i.e. Lk Ak U k ¼ Dk then the inverse factors Lk+1 and Uk+1 for Ak+1 will be obtained by writing       Lk 0 U k zk vk 0 Ak Dk ¼ y k 1 0 dkþ1 wk akþ1 0 1

ð10Þ

ð11Þ

in which Ak z k ¼ v k ; y k Ak ¼ wk ;

ð12Þ ð13Þ

dkþ1 ¼ akþ1  wk zk ¼ akþ1  y k vk .

ð14Þ

Relation (14) can be exploited if one of the systems (12) or (13) is solved exactly. Otherwise we should use dkþ1 ¼ akþ1  wk zk  y k vk þ y k Ak zk

ð15Þ

1 instead of Eq. (14). From Eq. (10) we have A1 k ¼ U k Dk Lk . By substituting this relation in Eqs. (12) and (13) we conclude

zk ¼ U k D1 k Lk vk ; y k ¼ wk U k D1 k Lk . Starting from k = 1, this scheme suggests an obvious algorithm for computing the inverse factors of A. When this scheme is carried out incompletely, an approximate factorization of A1 is obtained. Sparsity can be preserved by keeping only lfil largest elements in magnitude in zk or yk or both of them. This algorithm can be summarized as follows. Algorithm 3. AIB algorithm 1. 2. 3. 4. 5. 6. 7. 8.

Set L1 = [1], U1 = [1], A1 = [a11] and d1 = a11 For k = 1, . . . , n  1 Do: Compute zk ¼ U k D1 k Lk vk and keep lfil largest elements in absolute value in zk Compute y k ¼ wk U k D1 k Lk and keep lfil largest elements in absolute value in yk Compute dk+1 = ak+1  wkzk  ykvk + ykAkzk. Form Lk+1, Uk+1 and Dk+1 EndDo. L :¼ Ln, U :¼ Un and D :¼ Dn.

786

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

This algorithm returns L and U and we have LAU  D. If A is symmetric, L = UT and work is halved. Furthermore, if A is SPD, then it can be shown that [11], in exact arithmetic, dk > 0 for all k. Therefore, AIB algorithm will not breakdown. 4. A parallel implementation of the BAIB algorithm In this section, we first review the BAIB algorithm [13] and then propose the parallel version of this algorithm. In the BAIB algorithm, we consider a sequence of matrices   Ak Vk Akþ1 ¼ ; ð16Þ W k Dkþ1 where V k 2 Rsk mk , W k 2 Rmk sk , Dkþ1 2 Rmk mk , Ak 2 Rsk sk for k = 1, . . . , l  1, in which Al = A and mk  n. If the inverse factors Lk and Uk are available for Ak, i.e., Lk Ak U k ¼ Dk

ð17Þ

then the inverse factors Lk+1 and Uk+1 for Ak+1 are easily obtained by writing       Ak U k Z k Dk 0 0 Vk Lk ¼ Y k Lk 0 Dk W k Dkþ1 0 Uk

ð18Þ

in which Lk ðDkþ1  W k A1 k V k ÞU k ¼ Dk ;

ð19Þ

Y k Ak ¼ Lk W k ;

ð20Þ

Ak Z k ¼ V k U k .

ð21Þ

Letting Rk ¼ Dkþ1  W k A1 k V k ; Eq. (19) becomes Lk Rk U k ¼ Dk .

ð22Þ

This relation shows that the matrices Lk and U k are the inverse factors of Rk . 1 Now, from Eq. (17) we have A1 k ¼ U k Dk Lk . Hence by substituting this relation in Eqs. (20) and (21) the following relations are obtained: 1 Y k ¼ Lk W k A1 k ¼ Lk W k U k Dk Lk ;

Zk ¼

A1 k V kU k

¼

U k D1 k Lk V k U k .

ð23Þ ð24Þ

Therefore, the inverse factors of Ak+1 can be obtained by computing Yk and Zk from Eqs. (23) and (24) and the inverse factors of Rk . Starting from k = 1, this scheme suggests an obvious algorithm for computing the inverse factors of A. When this scheme is carried out incompletely, an approximate factorization of A1 is obtained. See [13] for more details. In the case that mk = 1, k = 1, . . . , n  1, the algorithm result in the AIB algorithm. The new approach, hereafter referred to as the PBAIB algorithm (parallel version of the BAIB algorithm) is 1 described as follows. Let T k ¼ A1 k V k and S k ¼ W k Ak . Then we have Rk ¼ Dkþ1  W k T k ;

ð25Þ

Zk ¼ T kU k;

ð26Þ

Y k ¼ Lk S k .

ð27Þ

These relations suggest a new approach to compute the inverse factors of a matrix which is inherently parallel. In this algorithm, we first compute matrices Tk and Sk by solving the linear systems of equations with multiple right-hand sides AkTk = Vk and ATk S Tk ¼ W Tk , respectively. Then the inverse factors Lk and U k for Rk are computed. Finally, Zk and Yk are obtained via Eqs. (26) and (27), respectively. To compute a sparse approximate inverse factors, it is enough to carry out all the computations approximately. The new algorithm can be summarized as follows.

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

787

Algorithm 4. PBAIB algorithm 1. Let A1 be the m1th leading principal submatrix of A. Compute L1, U1 and D1 such that L1A1U1  D1 2. For k = 1, . . . , l  1 Do (in parallel) 3. Solve AkTk = Vk for Tk approximately 4. Rk :¼ Dkþ1  W k T k 5. Compute Lk and U k and Dk such that Lk Rk U k  Dk 6. Z k :¼ T k U k and apply numerical dropping to Zk 7. Solve ATk S Tk ¼ W Tk for S Tk approximately 8. Y k :¼ Lk S k and apply numerical dropping to Yk 9. EndDo Some considerations can be posed for this algorithm. It is obvious that if A is symmetric then the work is halved. In steps 1 and 5 of this algorithm, the computation of the inverse factors of A1 and Rk can be computed approximately by the AIB algorithm. Here we note that in steps 3 and 7, two linear systems with multiple right-hand sides should be solved approximately and it can be carried out in parallel. Another point which is mentioned here is the computation of matrix Tk. In fact, we do not need to store this matrix. After computing the ith column of Tk, the ith column of Rk is computed. Hence, it is enough to store the matrix Rk which is of small size and use a temporary vector for computing the ith column of Tk. 5. Analysis and implementation details In steps 3 and 7 of Algorithm 3 two linear systems with coefficient matrices Ai and ATi should be solved approximately. Hence, we need the nonsingularity of all the leading principal submatrices of A. We first introduce the concept of an M-matrix. A = (aij) is called an M-matrix if aij 6 0 for all i 5 j, A is nonsingular and A1 P 0. Let A be an M-matrix that is partitioned in block matrix form A = (Aij), where Aii’s are square matrices. Then the matrices Aii on the diagonal of A are M-matrices [1]. Also, let A be an M-matrix that is partitioned in two-by-two block form, i.e. ! B E A¼ ; ð28Þ F C where B is square matrix. Then the Schur complement S ¼ C  FB1 E

ð29Þ

exists and, itself is an M-matrix [1]. Now, let A be an SPD matrix (M-matrix) and all steps of Algorithm 3 are done exactly. By definition Rk is the Schur complement of Ak+1 and is an SPD matrix (M-matrix), since Ak+1 is a leading principal submatrix of A and is an SPD (M-matrix). So Algorithm 2 will not break down. In [13], for M-matrices, a strategy was proposed for approximating Rk that guarantees Algorithm 3 does not breakdown when it is done approximately. In order to take advantage of the large number of zero entries of A, special schemes are required to store sparse matrices. The main goal is to preserve the nonzero elements. The common storage schemes are so-called compressed sparse row (CSR) or compressed sparse column (CSC) [20]. In the CSR format we create three array: a real array A[nnz] and two integer arrays JA[nnz] and IA[n+1], where nnz is the number of nonzero entries in the matrix. Array A stores the nonzero entries of the matrix row by row from row 1 to n. Array JA stores the column indices of the entries in the array A. Finally, array IA stores the pointers to the beginning of each row in the arrays A, JA. The CSC format is identical with the CSR format except that the columns of the matrix are stored instead of the rows. In practical implementation of Algorithm 3 we must easily have access to matrices Ai, i = 1, . . . , n. This can be done if matrix A is stored in the skyline format described as follows [20]. In this format, the following arrays are used: DIAG stores the diagonal of A, AL, JAL, IAL store the strict lower part of matrix A in CSR format and AU, JAU, IAU store the strict upper part of matrix A in

788

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

CSC format. In the symmetric case we only need to store the lower triangular part of A. It can be easily seen that if A is stored is the skyline format the matrices Ai can be extracted easily, moreover they are in they skyline format. To approximate the inverse factors of a matrix by using Algorithm 3 the following parameters are used: • m: We let mk = m for k = 1, . . . , k  1. • lfil: We use lfil iterations of Algorithm 1 for solving the systems in steps 3 and 7. • innerlfil: In steps 1 and 5 of Algorithm 3 the inverse factors of A1 and Rk are computed by using AIB algorithm (like PBAIB algorithm). Here we use the innerlfil iterations of Algorithm 1. Table 1 Test problems information and the number of iterations of unpreconditioned Bi-CGSTAB for convergence Matrix

n

nnz

Cond-Est

Application

Unp-Its

SHERMAN1 SHERMAN3 SHERMAN4 SHERMAN5 ADD20 ADD32 ORSIRR1 ORSIRR2 HOR131 JPWH991 WATT1 WATT2 STEAM2 SAYLR1 SAYLR3 MEMPLUS PDE900 PDE2961

1000 5005 1104 3312 2395 4960 1030 886 434 991 1856 1856 600 238 1000 17,758 900 2961

3750 20,033 3786 20,793 17,319 23,884 6858 5970 4710 6027 11,360 11,550 13,760 1128 3750 126,150 4380 14,585

2.3e+04 6.9e+16 7.2e+03 3.9e+05 1.8e+04 2.1e+02 1.0e+02 1.7e+05 1.3e+05 7.3e+02 5.4e+09 1.4e+12 3.5e+06 1.6e+09 1.0e+02 Not available 2.9e+02 9.5e+02

Oil reservoir simulation Oil reservoir simulation Oil reservoir simulation Oil reservoir simulation Circuit modelling Circuit modelling Oil reservoir simulation Oil reservoir simulation Flow network problem Circuit physics modelling Petroleum engineering Petroleum engineering Enhanced oil recovery Oil reservoir simulation Oil reservoir simulation Circuit modelling Partial differential equation Partial differential equation

317   95 2014 244 43 1293 1255   33 155   82   283 979 76 140

Table 2 Setup time to compute sparse approximate inverse factors and results for split-preconditioned Bi-CGSTAB Matrix

SHERMAN1 SHERMAN3 SHERMAN4 SHERMAN5 ADD20 ADD32 ORSIRR1 ORSIRR2 HOR131 JPWH991 WATT1 WATT2 STEAM2 SAYLR1 SAYLR3 MEMPLUS PDE900 PDE2961

PBAIB Alg.

AIB Alg.

mk

P-time

It-time

P-Its

mk

P-time

It-time

P-Its

10 12 12 12 5 5 10 5 12 12 5 5 12 5 5 5 5 10

1.87 57.29 2.14 23.40 15.55 57.06 3.79 2.69 1.1 2.7 10.22 9.50 1.32 0.33 2.2 1674.43 2.53 26.42

0.28 12.96 0.22 1.54 2.03 0.38 2.04 1.37 0.22 0.11 0.22 0.99 0.06 0.22 0.33 45.97 0.22 1.82

42 303 32 61 97 11 157 161 35 15 11 53 5 169 42 265 27 48

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1.59 33.51 1.54 13.73 11.86 39.06 1.98 1.64 0.44 1.59 6.07 6.92 0.49 0.09 fail 817.34 1.64 20.60

0.11 30.87 0.17 0.66 0.33 0.22 3.57 30.76 2.37 0.06 0.11 0.61 0.00 1.37

25 1006 23 33 18 5 470   898 10 11 40 1 1450 – 796 15 31

Timings are in seconds.

– 138.03 0.11 0.77

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

789

Matrices Zk and Yk computed in steps 6 and 8 are not necessarily sparse. To remedy this problem after computing Zk and Yk, lfil largest elements in absolute value in each row of Zk and each column of Yk are kept. Table 3 Results for matrices SHERMAN1 with different values of lfil and innerlfil lfil

innerlfil

P-time

It-time

P-Its

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

0.83 1.10 1.32 1.59 1.87 2.09 2.42 2.64 2.85 3.07

0.28 0.22 0.27 0.22 0.28 0.33 0.38 0.33 0.33 0.33

111 67 58 46 42 41 38 33 32 32

mk = 10.

Table 4 Results for matrices SHERMAN5 and PDE900 with a different values of mk mk 6 7 8 9 10

SHERMAN5

PDE900

P-time

It-time

P-Its

P-time

It-time

P-Its

19.11 19.39 19.66 20.38 20.04

1.54 1.54 1.59 1.53 1.87

71 69 68 64 76

1.75 1.81 1.86 1.87 2.04

0.22 0.27 0.27 0.28 0.27

32 31 31 28 27

innerlfil = 6 and lfil = 4.

0

100

200

300

400

500

600

700

800

900

1000

0

100

200

300

400

500

600

700

800

Fig. 1. Sparsity pattern of matrix SHERMAN1.

900

1000

790

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

Fig. 2. Sparsity pattern of the inverse factors of matrix SHERMAN1 for different values of parameters innerlfil and lfil with mk = 10.

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

791

6. Numerical examples All the numerical experiments presented in this section were computed in double precision with some FORTRAN 77 codes on a personal computer Pentium 3 – 800EB MHz. Some general matrices from Harwell–Boeing collection [8–10] were used for the numerical experiments. These matrices with their generic properties were shown in Table 1. This table gives, for each matrix, the order n and the number of nonzero entries nnz, condition number estimation (Cond-Est) and their application. In the last column of this table the number of iterations of the unpreconditioned Bi-CGSTAB (Unp-Its) [22] for convergence are given. Numerical results of an experimental comparison between PBAIB and AIB algorithms are given in Table 2. In the PBAIB algorithm we let lfil = innerlfil = 5 and in the AIB algorithm lfil = 10 was chosen. For all the examples, the right hand side of the system of equations were taken such that the exact solution is x = [1, . . . , 1]T. The stopping criterion kb  Axi k2 < 107 kbk2 was used and the initial guess was taken to be zero vector. For each preconditioner, we give the setup time for the preconditioner (P-time), the number of iterations (P-Its) and time (It-time) for the split-preconditioned BiCGSTAB algorithm for convergence. The parameter mk for each linear system is also specified. In tables, a dagger ( ) indicates that there was no convergence in 5000 iterations. Table 2 shows the effectiveness of the PBAIB algorithm. Numerical results in this table show that this method is very effective in improving the convergence of iterative solvers. From the point view of robustness, we see that the AIB algorithm fails on matrices ORSIRR2 and SAYLR3 and has poor convergence on matrices SHERMAN3, ORSIRR1, HOR131,SAYLR1 and MEMPLUS, but the PBAIB algorithm never fails and on this matrices have reasonable convergence. This is because of this fact that the AIB algorithm has a long recursive relation for large matrices and the round off errors or errors which appear from approximating in the primary stages may increase in large steps and an instability is detected. For the other examples the convergence results of these two algorithms are comparable. As Table 2 shows the setup time to compute sparse approximate inverse factors of a matrix by the PBAIB algorithm for large matrices is usually greater than that of the AIB algorithm. This is because of computing the sparse approximate solution of a system by algorithms 1 or 2 is more expensive. Here we note the main advantage of the PBAIB algorithm over AIB algorithm is its parallelism. In Table 3, the numerical results for matrices SHERMAN1 with mk = 10 and different values of lfil and innerlfil are given. This table shows the effect of increase in these parameters on the convergence results. As we expect with increasing these parameters the number of iterations for the split-preconditioned Bi-CGSTAB algorithm for convergence decreases. In Table 4, the numerical results for matrices SHERMAN5 and PDE900 with different values of mk and innelfil = 6 and lfil = 4 are given. This table shows the number of iterations for the split-preconditioned BiCGSTAB algorithm for convergence depends strongly on the value of parameter mk. An important part of the PBAIB algorithm is the choice of an appropriate block size. This is a problem for future works and is under investigation. In Fig. 1 the sparsity pattern of matrix SHERMAN1 is displayed and in Fig. 2 the sparsity pattern of the inverse factors of matrix SHERMAN1 computed by Algorithm 1 for different values of parameters innerlfil and lfil with mk = 10 are shown. 7. Conclusion and future works We have proposed an algorithm to compute the sparse approximate inverse factors of a matrix. In this method no sparsity pattern is needed in advance and the computations of inverse factors can be carried out in parallel. Numerical results show that the new method is very effective to improve the convergence rate of the iterative methods. In this method, we have used the block size mk. Numerical results show that the number of iterations to convergence for the preconditioned system depends on the value of this parameter. Here

792

D.K. Salkuyeh et al. / Applied Mathematics and Computation 181 (2006) 782–792

the problem is to choose the best block size. Future work may focus on solving this problem. A comparison between the new method and the AIB algorithm was also given. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

O. Axelsson, Iterative Solution Methods, Cambridge University Press, Cambridge, 1996. M. Benzi, Preconditioning techniques for large linear systems: a survey, J. Comput. Phys. 182 (2002) 418–477. M. Benzi, M. Tuma, A comparative study of sparse approximate inverse preconditioners, Appl. Numer. Math. 30 (1999) 305–340. M. Benzi, M. Tuma, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput. 19 (1998) 968–994. E. Chow, Y. Saad, Approximate inverse preconditioners via sparse–sparse iterations, SIAM J. Sci. Comput. 19 (1998) 995–1023. E. Chow, Y. Saad, Approximate inverse techniques for block-partitioned matrices, SIAM J. Sci. Comput. 18 (1997) 1657–1675. E. Chow, Y. Saad, ILUS: an incomplete LU preconditioner in sparse skyline format, Int. J. Numer. Methods Fluids 25 (1997) 739– 748. I.S. Duff, A survey of sparse matrix research, Proc IEEE 65 (1977) 500–535. I.S. Duff, A.M. Erisman, J.K. Reid, Direct Methods for Sparse Matrices, Clarendon Press, Oxford, 1986. I.S. Duff, R.G. Grimes, J.G. Lewis, User’s guide for Harwell–Boeing sparse matrix test problems collection, Technical Rep. RAL-92086, Rutherford Appleton Laboratory, 1992. N.I.M. Gould, J.A. Scott, Sparse approximate-inverse preconditioners using norm-minimization techniques, SIAM J. Sci. Comput. 19 (1998) 605–625. M.J. Grote, T. Huckle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput. 18 (1997) 838–853. D.K. Salkuyeh, F. Toutounian, A block version algorithm to approximate inverse factors, Appl. Math. Comput. 162 (2005) 1499– 1509. D.K. Salkuyeh, F. Toutounian, A new approach to compute sparse approximate inverse of an SPD matrix, IUST – Int. J. Eng. Sci. 15 (2004). D.K. Salkuyeh, F. Toutounian, A sparse–sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix, in review. L.Y. Kolotilina, A.Y. Yeremin, Factorized sparse approximate inverse preconditioning I. Theory, SIAM J. Matrix Anal. Appl. 14 (1993) 45–58. L.Y. Kolotilina, A.Y. Yeremin, Factorized sparse approximate inverse preconditioning II: solution of 3D FE systems on massively parallel computers, Int. J. High Speed Comput. 7 (1995) 191–215. Y. Saad, Iterative Methods for Sparse Linear Systems, PWS press, New York, 1995. Y. Saad, M.H. Schultz, GMRES: a generalized minimal residual algorithm for nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 7 (1986) 856–869. Y. Saad, SPARSKIT: a basic tool kit for sparse matrix computations, Technical Report CSRD TR 1029, CSRD, University of Illinois, Urbana, IL, 1990. P. Sonneveld, CGS, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 14 (1993) 461–469. H.A. van der Vorst, Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 12 (1992) 631–644. J. Zhang, A sparse approximate inverse technique for parallel preconditioning of general sparse matrices, Appl. Math. Comput. 130 (2002) 63–85.