Advances in Engineering Software 31 (2000) 481–491 www.elsevier.com/locate/advengsoft
A new method to solve large linear systems: the algorithm of “recursive reduction” M. Benhamadou* Faculte´ des sciences de Sfax, Universite de Sfax, Sfax CP 3038, Tunisia Received 5 January 1998; revised 23 June 1999; accepted 3 December 1999
Abstract Let A 僆 Rn×n be a regular matrix, and b 僆 Rn ; we study a new direct method of the linear system resolution AX b by an algorithm called: algorithm of “recursive reduction”. 䉷 2000 Elsevier Science Ltd. All rights reserved. Keywords: Large linear systems; Recursive reduction; Numerical analysis
1. Introduction The appearance of finite elements applied to partial differential equations coming from different problems of engineering (mechanical, aeronautical, civil, electrical, nuclear), physics, naval construction, biology, biomechanics etc. leads to the solving of large linear systems, a central problem of scientific computation in numerical analysis. Therefore, the numerical methods were multiplied giving rise to a wide variety of algorithms; the principal criteria of their comparison being the numerical stability, the rapidity in computation of the solution and the memory place. The majority of the direct methods [4,5] like Gauss, Cholesky methods are based on the factorisation of the matrix A into one product LU, where L is a triangular lower matrix, and U a triangular upper matrix. The Householder method is associated with the QU factorisation where Q is an orthogonal matrix and U is an upper triangular matrix. We present here a new direct numerical method which consists in replacing the resolution of a n × n linear system by that of two
n ⫺ 1 ×
n ⫺ 1 systems, and by a recursive procedure of reduction in the order of a matrix one unit at each step. This leads to the resolution of systems in R. Precisely, the resolution of a n × n linear system paves the way for the resolution of two
n ⫺ 1 ×
n ⫺ 1 systems. This recursive procedure has then an exponent * Fax: ⫹216-4-274-437. E-mail address:
[email protected] (M. Benhamadou).
complexity; and by transforming it into an iterated algorithm, we obtain the same complexity 2n3 =3 flops [1, p. 82] of a Gauss method. When using this procedure we do not factorise the matrix in the usual sense, but we resolve the system by a new second member. This costs only n 2 flops. In Section 6, we give one parallelization of algorithm (A 2). In Section 7 we give a numerical example and summarise other similarities with the Gauss method. Finally we give performances of the recursive reduction method and a numerical comparison with the Gauss method. In what follows, we will be interested in the study of the following problem (P): “Let A 僆 Rn×n be a nonsingular matrix and b 僆 Rn ; our aim is to find X 僆 Rn satisfying: AX b”. (P). Notations: we denote by • ai;j an element in row i and column j of matrix A. • Ak the submatrix of A obtained with elements aij i; j 1; …; k • aj (resp. a~ j ) the jth column vector, (resp. the jth row vector) of matrix A • Xk 僆 Rk of component xj j 1; …; k • b will be an
n ⫹ 1th column of the augmented matrix obtained by attaching the column b to A matrix Rn×
n⫹1 that will be noted always as A.
2. Algorithm principle The fundamental idea of the algorithm consists in reducing of one unit the size of the problem to be solved
0965-9978/00/$ - see front matter 䉷 2000 Elsevier Science Ltd. All rights reserved. PII: S0965-997 8(99)00064-2
482
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
The pivot is not equal " to zero, # g is defined, Xn⫺1 Xn⫺1 is solution of (P). Ln⫹1 ⫺ gLn and then X g
and, in partitioning it as follows 32 3 2 3 2 j j 76 7 6 7 6 j 7 6 7 6 6 j an nÿ1 7 Anÿ1 76 Xnÿ1 7 6 an1 nÿ1 7 6 j 76 76 7 6 76 7 6 7 6 j 76 7 6 7 6 j 54 5 4 5 4 j an1;n ~an nÿ1 j an;n g
3. Generalization
Xn⫺1 ; an n⫺1 ; an⫹1 n⫺1 僆 Rn⫺1 ; An⫺1 僆 R
n⫺1×
n⫺1 ;
g 僆 R;
an;n 僆 R
The development by block of system AX b reduces problem (P) to search for Xn⫺1 and g verifying: ( An⫺1 Xn⫺1 ⫹ gan n⫺1 an⫹1 n⫺1
1 a~n n⫺1 Xn⫺1 ⫹ gan;n an⫹1;n If det
An⫺1 苷 0 then the first Eq. (1) is equivalent to:
The idea of the algorithm (A1) is particularly interesting when Ln and Ln⫹1 are easy to compute, i.e. An⫺1 is easily invertible (diagonal, upper triangular, lower triangular); in this case the problem (P) is solved. Unfortunately, in the general case, the submatrix An⫺1 is not easily invertible, so we suggest to proceed in a recursive manner to reduce the order of An⫺1 by one unit and so on until we get a submatrix A1 of order 1, i.e. to a resolution in R. By means of this procedure of recursive reduction, the solution of problem (P) is then reconstructed progressively starting from the solutions supplied to the order 1. Let us start by giving an example
n 4:
2
Xn⫺1 Ln⫹1 ⫺ gLn with: ( Ln A⫺1 n⫺1 an n⫺1 Ln⫹1 A⫺1 n⫺1 an⫹1 n⫺1 By replacing Xn⫺1 in the second Eq. (1) we obtain:
g
an⫹1;n ⫺ a~n n⫺1 Ln⫹1 an;n ⫺ a~n n⫺1 Ln
3
under the hypothesis an;n ⫺ a~n n⫺1 Ln 苷 0 (see Proposition 1). We therefore obtain the (A1) algorithm: ( Solving : An⫺1 Ln an n⫺1 An⫺1 Ln⫹1 an⫹1 n⫺1 Computing g an⫹1;n ⫺ a~n n⫺1 Ln⫹1 an;n ⫺ a~n n⫺1 Ln
A1 Computing Xn⫺1 Ln⫹1 ⫺ gLn The solution of AX b is given by :
if An⫺1 is nonsingular if an;n ⫺ a~n n⫺1 Ln 苷 0 " X
Xn⫺1
#
g
We call pivot the denominator of g ; thus we obtain the following proposition. If A and An⫺1 are nonsingular, # " then the Xn⫺1 ; pivot is not equal to zero and the vector X g given by the algorithm
A1 ; is the unique solution of (P). Proposition 1.
Proof. As A and An⫺1 are nonsingular, using Frœbenius Schur formula [6, p. 85] we have:
Let L
n⫺k 僆 Rn⫺k be the unknown vector appearing in j the kth step of the decomposition, associated with the aj n⫺k j n ⫺ k ⫹ 1; …; n ⫹ 1: subsystems An⫺k L
n⫺k j At the first step we have A4 L
4 5 a5 4 : Using formulas (2) and (3), the resolution of the four trivial equations in R obtained in the ultimate place of decomposition permits “reconstruction” of the solution in three steps. In this example, we see that, at the step k 2; the same system A2 L
2 3 a3 2 appears twice, and that at the step k 3; the same system A1 L
1 2 a2 1 appears four times, and a3 1 appears twice. At step k, we the same system A1 L
1 3 would solve 2 k systems; in fact certain systems among them appear several times, which gives to this method an exponential complexity. The main idea is to replace the recursive algorithm by an iterative one, i.e to solve at each step k
n ⫺ k ⫹ 1 linear systems
det
A det
An⫺1 det
an;n ⫺ a~ n n⫺1 A⫺1 n⫺1 an n⫺1
n⫹1 {Ak L
k j aj k }jk⫹1
and det
an;n ⫺ a~ n n⫺1 A⫺1 n⫺1 an n⫺1 苷 0:
The solutions of these linear systems (3) are useful to us
4
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
483
afterwards and we then obtain the following iterative algorithm.
symbolised by the following table:
3.1. Algorithm of “recursive reduction”
~ak kÿ1
L
kÿ1 k a
kÿ1 k;k
A2
9n > > > > > > > > > > > = > > > > > > > > > > > ;
jk1
k2
Let us call
Sn : An L
n n⫹1 an⫹1 n ; and reduce the order of the matrices Ak by one unit at each step, we then introduce the intermediate problems:
Sk : Ak L
k j aj k k n ⫺ 1; …; 1 j
k ⫹ 1; …;
n ⫹ 1 to arrive at a submatrix A1 of order 1 and solve in R j 2; …;
n ⫹ 1
Now we suppose that det
Ak 苷 0 k n; …; 1 (this will be justified soon after). We apply the (A2) algorithm and we do as we have already done for the Gauss method, the L
k j replaces aj k : Step 1: (initialization) computation of {L
1 j with det
A a 苷 0: a1;j =a1;1 }jn⫹1 1 1;1 j2
1 L
1 2 L3
a1;1 a1;2 a1;3
L
1 j
L
1 n1
a1; j
a1;n1
Step k: From the L
k⫺1 and the kth row of A we construct j the two “components” of L
k j :
g
k j
a
k⫺1 k;j
⫺
G
k k1 g
k k1
a
kÿ1 k;n1
a~ k k⫺1 L
k⫺1 j pk
k⫺1 and G j
k Lj
k⫺1 ⫺ g
k j k ⫹ 1; …; n ⫹ 1 which is j Lk
G
k j
G
k n1
g
k j
g
k n1
and so on and so forth until step n where knowing Ln
n⫺1 and
n⫺1 Ln⫹1 ; (A2) algorithm calculates
g
n n⫹1
3.2. Comments
S1 : A1 L
1 j aj 1 a1;j
a
kÿ1 k;j
L
kÿ1 n1
n1
a1;j Initialization of the first row : L
1 j a1;1 j2 8
kÿ1
kÿ1 > Pivot calculation : p : a ÿ ~ a k k kÿ1 Lk k;k > > > > > 8 9n1
kÿ1 > > ak;j ÿ ~ak kÿ1 Lj
kÿ1 > > >
k > > > > > g > j > > pk <> > > > > > < =
k
kÿ1
k
kÿ1 > > G L ÿ g L j j j k > > > > > > > > "
k # > > > > > > > Gj >> > >
k > > > > L : ; j :
k g j
#
In the following, the proposed algorithm will be detailed:
L
kÿ1 j
n⫺1
n⫺1 an;n⫹1 ⫺ a~n n⫺1 Ln⫹1
n⫺1 an;n ⫺ a~ n n⫺1 Ln
n⫺1
n
n⫺1
n⫺1 and G n⫹1 Ln⫹1 ⫺ g
n ; which gives n⫹1 Ln 2 3 G
n 4 n⫹1 5 L
n n⫹1 g
n n⫹1
as the solution of problem (P). 3.3. The pivot computation strategy ⫺ The (A2) algorithm uses pivot pk a
k⫺1 k;k
k n⫹1 a~ k k⫺1 L
k⫺1 to calculate { g } : We suggest two jk⫹1 j k strategies to compute the pivot for improvement of the numerical stability. We start by the choice of one first pivot p1 supi僆1;n 兩ai;1 兩; strictly positive
det
A 苷 0: Next we permute pivot line and the first line, given formally by the left multiplication of matrix A by a permutation matrix P (1); we note that A
1 P
1 A: At the kth step of the first strategy,
k 僆 2; n the pivot is ⫺ a~l k⫺1 L
k⫺1 兩; let i0 be defined by pk max`僆k;n 兩a
k⫺1 k `;k the subscript of the maximum, we then permute lines k and i0 of matrix A, equivalent to the left multiplication of matrix A by a permutation matrix P
k : So we put A
k P
k A
k⫺1 : The computation of pk in the first strategy requires at every step the computation of
n ⫺ k ⫹ 1 inner products, and each of them requires
k ⫺ 1 multiplications and
k ⫺ 2 additions, in total we obtain
n ⫺ k ⫹ 1
2k ⫺ 3; equivalent of 2kn flops. The second strategy reduces the number of inner 兩 as a reference at products. Choosing rk supi僆k;n 兩a
k⫺1 i;k step k and considering the set ⫺ a~` k⫺1 L
k⫺1 兩 ⱖ rk } Ek {` 僆 k; n; 兩a
k⫺1 k `;k When Ek 苷 f; let `k be the subscript associated to the minimum, the pk is defined by: 8
k⫺1
k⫺1 > if Ek 苷 f < 兩a`k ;`k ⫺ a~ `k k⫺1 Lk 兩 pk
k⫺1
k⫺1 > : max 兩a`;k ⫺ a~ ` k⫺1 Lk 兩 otherwise `僆k;n
484
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
Thus, the algorithm using the second strategy is by far more rapid, but with an eventual loss of stability. Remark 1. We notice that for positive definite matrices, we can solve problem (P) without using either of these strategies: the numerical stability is assured as for Cholesky’s method. Theorem 1. If det
A 苷 0; there exists a rows permutation matrix such that: 8 > for k 1 sup 兩a
1 兩 ⬎ 0 > < i僆1;n i;1
k⫺1
k⫺1 > > k⫺1 L
k⫺1 兩⬎0 : sup 兩ai;k ⫺ a~i k
᭙k 僆 2; n
i僆k;n
Proof. In the first step k 1; at least one of the elements ai;1 of the first column of A matrix is different from zero since det
A 苷 0: We choose then the biggest of the nonzero coefficients of the first column of A as the first pivot. Next we permute the line of pivot by the first line, and we have then A
1 P
1 A: Thus A
1 1 is nonsingular. We execute the algorithm until step k k 僆 2; n; (A
k⫺1
k⫺1 is nonsingular). We show that if sup`僆k;n 兩a
k⫺1 ⫺ `;k
k⫺1 L
k⫺1 兩 0 then det
A a~
k⫺1 k `
k⫺1 Ak
`; ` 僆 k; n the matrix of order by replacing the last submatrix A
k⫺1 k : a~
k⫺1 k ` 2 6 6 6 6
kÿ1 Ak
l 6 6 6 6 4
A
kÿ1 kÿ1
~a
kÿ1 kÿ1 l
0: We note by k obtained from the line by the vector
3 j j 7 j j a
kÿ1 kÿ1 7 7 k 7 j 7 j 7 j 7 j 7 5 j a
kÿ1 l;k
By applying the Frœbenius Schur formula, we obtain:
k⫺1
` det
A
k⫺1 det
A
k⫺1 k k⫺1 deta`;k ⫺1
k⫺1 ⫺ a~
k⫺1 k⫺1
A
k⫺1 k⫺1 k⫺1 ak `
By using (4)
k⫺1
` det
A
k⫺1 ⫺ a~
k⫺1 k⫺1 L
k⫺1 det
A
k⫺1 k k⫺1 deta`;k k `
⫺ a~
k⫺1 k⫺1 L
k⫺1 兩 0 this implies: If sup`僆k;n 兩a
k⫺1 k `;k `
k⫺1
k⫺1 detAk
k⫺1
` det
Ak⫺1 deta`;` ⫺ a~`
k⫺1 k⫺1 Lk
k⫺1 0
᭙` 僆 k; n
` is a linear combinaThe kth line of the matrix A
k⫺1 k tion of the others, which implies that the determinant of A is equal to the determinant of the following matrix obtained by
replacing the lines which gives zero. A
kÿ1 kÿ1 det
A C
k ⫹ 1 to n by the linear combination j j j j B j j j j j G
where C is a nil matrix and where G has its first colomn nil. By applying Frœbenius Schur formula we obtain:
k⫺1 ⫺1 det
A det
A
k⫺1 k⫺1 detG ⫺ C
Ak⫺1 B
As the matrix C is nil, so det
A det
A
k⫺1 k⫺1 det
G and as the matrix G has its first column nil, then det
G 0 and then we obtain: det
A 0: A Proposition 2. Let A 僆 Rn×n ; b 僆 Rn : By applying
A2 algorithm with pivot strategy until step k, then
1 detA
k k ea1;1
k Y
i⫺1 ~
i a
i i;i ⫺ a i i⫺1 Li
e ^1
i2
Proof. The demonstration is based on the following recurrence relation:
k⫺1 det
A
k⫺1 ⫺ a~
k⫺1 k⫺1 L
k⫺1 det
A
k⫺1 k k⫺1 detak;k k k
A
i
i⫺1 ~
i we Note. As p1 a
1 i i⫺1 Li 1;1 and pi ai;i ⫺ a obtain:
1 detA
k k ea1;1
k Y
i⫺1 ~
i a
i e i;i ⫺ a i i⫺1 Li
i2
k Y
Particularly with k n; we obtain: det
A e
i⫺1 ; and after that a~
i i i⫺1 Li det
A e
n Y
pi
pi
i1
Qn
e ^1
i1
i1
a
i i;i ⫺
5
We obtain Theorem 2. Let A 僆 Rn×n be a nonsingular matrix, and b 僆 Rn : There exists a permutation matrix P such that the vector L
n n⫹1 given by the algorithm
A2 is the solution of PAx Pb: Proof. It suffices to apply the algorithm (A2) with pivoting strategy to the augmented matrix A bordered by the second member b. 4. Computation of the number of elementary operations We study the complexity of the steps of (A2)
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
(a) Initialization:
n ⫹ 1 ⫺ 2 ⫹ 1 n divisions (b) Step k: ⫺ a~ k k⫺1 L
k⫺1 ; k⫺1 ◦ Computation of pk a
k⫺1 k;k k multiplications and
k ⫺ 1 additions/subtractions
k⫺1 ⫺ a~k k⫺1 L
k⫺1 =pk j ◦ Computation of g
k j
ak;j j k ⫹ 1; …; n ⫹ 1: For each of them
k ⫺ 1 multiplications, and
k ⫺ 1 additions/subtractions, one division, in total
n ⫹ k ⫺ 1 divisions,
k ⫺ 1
n ⫹ k ⫺ 1 multiplications, and additions/subtractions ⫺ gj
k⫺1 L
k⫺1 j ◦ Computation of G j
k L
k⫺1 j k k ⫹ 1; …; n ⫹ 1: For fixed j we affect
k ⫺ 1 multiplications and
k ⫺ 1 subtractions, in total:
n ⫺ k ⫹ 1
k ⫺ 1 multiplications and additions/subtractions. The algorithm then requires in total: P n
n ⫹ 1 ◦ Divisions: n ⫹ nk2
n ⫹ 1 ⫺ k 2 ◦ Additions/subtractions and multiplications: n X k2
n
n ⫺ 1 n
n ⫺ 1
n ⫹ 1 2n ⫹ ⬃ flops 2 3 3 3
The number of operations of the Gauss method and that of “recursive reduction” algorithm are summarized in the following table: The method Gauss Recursive Reduction
member. We can see the Turbo-Pascal program in Ref. [10, p. 123], where we use an adequate numerotation function g [2, p. 95].
n ⫹ 1
n ⫹ 2 g : 1; n ⫹ 1 × 1; n ⫹ 1 ! 1; 2
i; j
! g
i; j
j
j ⫺ 1 ⫹i 2
4.1. Resolution with many second members To solve the linear system AX b; by the Gauss method we can proceed in two stages. At the first stage we compute the factorisation A LU that is stored in memory. At the second stage, we solve successively both systems ( Ly b Ux y
k ⫺ 1 ⫹ 2
k ⫺ 1
n ⫺ k
485
( and ÿ)
(*)
(/)
2n3 3n2 ÿ 5n 6 2n3 3n2 ÿ 5n 6
2n3 3n2 ÿ 5n 6 2n3 3n2 ÿ 5n 6
n2 n 2 n2 n 2
If we have to solve two systems having the same matrix, but different second member, the first stage of factorisation is carried out only once and allows to solve two systems. In our case it suffices to store at the time of the first resolution L
k k⫹1 and pk k 2; …; n then to compute: 8 a1;n1 > L
1 > n1 a1;1 > > > > 9n >8 > a
kÿ1 ak kÿ1 L
kÿ1 > >
k n1 > k;n1 ÿ ~ > > > > gn1 > > > pk <> > > > > > > < =
k
kÿ1
k
kÿ1 > > gn1 Ln1 ÿ gn1 Lk > > > > > > > > " # > > > > >
k > > > > g > >
k > n1 > > > and L : ;
k : n1 g n1
There is the same number of operations in both methods: ⯝ 2n3 =3 flops: Nevertheless it seems that the exponent 3 is not far to be optimal [3]. By taking a closer look we make the following remark. Remark 2. In the algorithm (A2), the most expensive is the computation of g
k j and more precisely the computation : These inner products cause of inner products a~ k k⫺1 L
k⫺1 j the intervention of a~k k⫺1 ; kth row of the matrix A having
k ⫺ 1 elements that do not change in current iterations. Here we see the importance of the structure of the lower part of the matrix A. If these {a~ k }nk2 are sparse (for example: A Hessenberg, band, sparse) then the inner products are going to have fewer oprations, which is interesting on the numerical computation. We observe that there is a fixed part which is the lower part of the matrix A and another part that changes during the iterations. This is the upper part of the matrix A hence the idea of unifying both structures, one more for the fixed part of the matrix A (i.e we do not store the zeros) and a triangular column by column structure, augmented by the second
k2
the supplementary cost is of: • n divisions • Twice n
n ⫺ 1=2 multiplications and additions/ subtractions in total ⯝ 2n2 flops; as in a second use of the Gauss classical method. 5. Block matrix inverse Let M 僆 R
p⫹q×
p⫹q be a larger regular matrix, and suppose M is partitioned as follows: 0 p pB A j M B @ ÐÐ j q C j A [ Rp£p ;
q 1 B C C ÐÐ A D
D [ Rq£q
p1qn
Problem: Compute M ⫺1. We can use the Gauss Jordan practical disposition as
486
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
under the hypothesis det
A 苷 O and det
D 苷 O; we obtain the simplified formula:
follows: A
B
Ip
O
C
D
O
Iq
0
Aÿ1 B
M ÿ1
Aÿ1
O
Step 2: we insert a new block line as follows:
C
Aÿ1 B
Aÿ1
O
D
O
Iq
and we calculate successively: • the pivot P D ⫺ CA⫺1 B that we suppose such that det
P 苷 O ⫺1 ⫺1 ⫺1 僆 Rq×p ; g
2 僆 Rq×p • g
2 3 ⫺P CA 4 P
2
2
2 ⫺1 ⫺1 ⫺1 • G 3 A ⫺ A Bg3 ; G 4 ⫺A Bg
2 4 Aÿ1 B
Aÿ1 ÿ Aÿ1 Bg
2 3
ÿAÿ1 Bg
2 4
D
g
2 3
g
2 4
C
Hence we easily obtain by algorithm (A2), the same formula that [1, p. 22]: p
0 ÿ1
M ÿ1
ÿ1
q Bg
2 3
pBA ÿ A B @ ÐÐÐÐÐÐÐÐÐÐ q ÿP ÿ1 CAÿ1
j j j
ÿ1
1
Bg
2 4
ÿA C C ÐÐÐÐÐÐÐ A ÿ1 P
6
5.1. Computation of inverse of some special matrix Let M 僆 R
p⫹q×
p⫹q be a larger regular matrix and suppose M is partitioned as follows: 0 p pB A j M B @ ÐÐ j q C j A [ Rp£p ;
q
ÿ1
by applying block algorithm (A2) with two block second members, we obtain: Step 1: initialisation we suppose det
A 苷 O A
p
q 1 B C C ÐÐ A D
where C 僆 Rq×p and p ⫹ q n: Suppose that C O (or similarly B O) in Eq. (6). Then
j j j
1 ÿA BDÿ1 C C ÐÐÐÐÐÐÐ A D ÿ1
7
Application to the revised simplex method: We recall that the revised simplex method [7, p. 681] solves these three equations (where A is partitioned as A B; N): 8 BxB b gives the nonzero part of x > > > > t > zB t cB gives z and the reduced cost r t cN ⫺ t zN > > < Bv y gives v and then the ratios y > > > >
is the entering column of N so that v > > > : is the column of B⫺1 N
8 and never computes B⫺1 N. What it needs is current B ⫺1. That raises a natural question in matrix algebra. If a column leaves B and is replaced by a column of N, what is the change on B ⫺1? The step starts whith a basis matrix Bold and ends whith a basis matrix Bnew. One column is changed—say the last column of Bold is replaced by the entering column y taken from N. The vector v mentioned ⫺1 above is B⫺1 old y: We claim that this vector v leads to Bnew if we multiply by the right matrix: 1ÿ1 0 1 v1 C B .. C B C B . B .. C C Bÿ1 B Bÿ1 new B old 1 . C C B C BÐ Ð Ð ÐA @ vm 0 ... 0 With Eq. (7) we 0 1 B .. B B . B B MB B BÐ Ð @ 0 ...
can calculate the inverse matrix M: 1 v1 C C C .. C C; 1 . C C ÐC Ð A vm 0
0
D [ Rq£q
where C [ Rq£p and p 1 q n
pB A B @ÐÐÐÐÐ q O
ÿ1
M
ÿ1
B1 B B B B B B B B B BÐ B @ 0
..
. 1
Ð
Ð
...
0
1 v1 ÿ v C .. m C C . C C C v ÿ mÿ1 C vm C C C Ð C C 1 A vm
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
Before moving to programming of the method, we make the following remarks. Remark 3. 1. The method that we have just seen was recursive; with the algorithm (A2), it has become iterative, we can exploit its recursivity to control at each step the solution of the system. Indeed if we are at the kth step and we have n⫹1 computed {L
k j }jk⫹1 then we can verify if we have exactly expression (4). 2. If we denote by a`p the pth column vector of the matrix A where ` indicates the index of first term nonzero, i.e. the vector ap has its first
` ⫺ 1 nil components and the `th ak` nonzero then for all step k ⬍ `; the system Ak L
k ` has for solution the nil vector. We take into consideration this property in the programming for sparse matrices in their upper triangular part, by considering a vector of the order
n ⫹ 1 called Indice as follows:
1
p
Indice :
l
3. The method of recursive reduction, like the Gauss method, permits to compute the determinant of matrix A with the formula (5). It also permits to verify if the matrix A is positive definite. That is why it suffices to take the permutation matrices P
k equal to identity and to verify if a1;1 and pk a
k⫺1 ⫺ a~k k⫺1 L
k⫺1 k;k k are all strictly positive for k 2; …; n: We can with compute the determinants of submatrices A
k k the formula (5). 6. Parallelisation algorithm A2 of recursive reduction We study the parallelisation of the algorithm A2 to solve a linear system AX b; we consider A as being of size n ×
n ⫹ 1 and we use the formalism of algorithm parallelisation developed in Ref. [9]. We give the following main notations: EX(f ) is an integer representing the time of the execution of the task T. p : Relation of partial order on the set of tasks Ti p Tk i 苷 k signifies that the execution of the task Ti must be finished before the beginning of Tk execution. H
G is by definition the height of G graph: it is the number of tasks of the longest way of G. L
G is by definition the width of graph G: it is the minimum of the widths of G decompositions.
487
Let us define the following tasks: Initialization: Execute T11 具L
1 j a1;j =a11 典 j 2; …; n ⫹ 1 For k 2 to n 典 Execute Tkk 具pk ak;k ⫺ a~ k k⫺1 L
k⫺1 k For j k ⫹ 1 to n ⫹ 1 ~k k⫺1 L
k⫺1 =pk 典 Execute Tkj 具g
k j ak;j ⫺ a j
k
k⫺1
k⫺1 ⫺ g
k L 典 Execute Ukj 具G j Lj j k We neglect the step of initialisation because it does not intervene in the computation of {Tkk ; Tk;k⫹1 ; Uk;k⫹1 } k 2; …; n. Ex
Tkk 2
k ⫺ 1 flops k 2; …; n Ex
Tkj 2
n ⫺ k ⫹ 1
k ⫺ 1 flops k 2; …; n Ex
Ukj 2
n ⫺ k ⫹ 1
k ⫺ 1 flops k 2; …; n The total number of operations is equal to ⯝ 2n3 =3 The constraints of precedences are: j k ⫹ 1; …; n ⫹ 1; k 1; …; n Tkk p Tkj j k ⫹ 1; …; n ⫹ 1; k 2; …; n: Tkj p Ukj
n
n1
6.1. Theoretical analysis of the graph Hypothesis: EX
Tkk EX
Tk;k⫹1 EX
Uk;k⫹1 btk where b is an integer and tk k a time unit k 2; …; n: To determine the optimal time Topt, we must find the longest graph path, it is not difficult to see that there exist
n ⫺ 1 those concerning the tasks {T2;j }n⫹1 j3 . The longest graph path is formed by the tasks {Tkk ; Tk;k⫹1 ; Uk;k⫹1 } k 2; …; n: Its length is equal to time Topt which is inferior bound of time taken for the parallel execution of all algorithm, we obtain: Topt
n X
2b
k ⫺ 1 ⫹ 2b
k ⫺ 1 ⫹ 2b
k ⫺ 1
k1
Topt 6b
n X k2
k ⫺ 1
6b
n ⫺ 1n 3bn2 ⫹ o
n 2
We have a graph where the number of tasks by level decreases. We remark also that the precedence of graph as we have represented, corresponds to the decomposition by the predecessor. In fact, all the tasks of the same level have the same execution time, the width of all compositions being equal to
n ⫺ 1; it is evident to construct, by using popt ⱕ
n ⫺ 1 processors, an algorithm in Topt.
488
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
For k 1
6.2. Graph of tasks
L
1 2
A1
Example for n 5 T2;2 & T2;3 T2;4 T2;5 T2;6 # # # # U2;3 U2;4 U2;5 U2;6 # T3;3 & T3;4 T3;5 T3;6 # # # U3;4 U3;5 U3;6 # T4;4 & T4;5 T4;6 # # U4;5 U4;6 # T5;5 & T5;6 # U5;6
1
level 1
L
1 2
A2 0 ~ a2 1
level 2
B B0 B B B0 B AB B B0 B B B0 @
⫺1
1
1
1 ⫺1
0
1
1
0
0
1
0
0
0
1 ⫺1
1 ⫺1
0 1 1 B C B1C B C B C B1C B C C solution : X6 B B C B1C B C B C B1C @ A 1
1 1 a2;2
ÿ1
1
G
2 3
G
2 4
ÿ2 1
g
2 3
L
2 3
A3
level 3
0 ~ a3 2 level 4
L
1 6
L
1 7
ÿ1
1
2
G
2 5
G
2 6
2 ÿ1
g
2 4
ÿ2 1
g
2 5
2 ÿ1
g
2 6
G
2 7 1 1
g
2 7
0
ÿ2 1 1 a3;3
A4
⫺1
1
1
C 1 ⫺1 C C C ⫺1 1C C C C 1 ⫺1 C C C 1 1C A 1 ⫺1
G
3 4 4 ÿ2 1
G
3 5 ÿ4 2 ÿ1
G
3 6 4 ÿ2 1
G
3 7 5 ÿ1 2
g
3 4
g
3 5
g
3 6
g
3 7
L
3 4
G
4 5
G
4 6
G
4 7
For k 4
We use the algorithm (A2) without pivoting strategy to solve the linear system A6 X6 b A 僆 R6×6 ; b 僆 R6 :
1
L
1 5
For k 3
7. Numerical example
1
1
L
1 4
For k 2
0
0
L
1 3
0
a4;4
ÿ8 4 ÿ2 1
8 ÿ4 2 ÿ1
1 1 1 1
g
4 5
g
4 6
g
4 7
L
4 5
G
5 6
G
5 7
For k 5 A5
0 1 2 B C B1C B C B C B2C B C C bB B C B1C B C B C B2C @ A 0
0 ~ a4 3
4 ÿ2 1 1
0
0
0 ~ a5 4
0
ÿ8 4 ÿ2 1 1 a5;5
16 ÿ8 4 ÿ2 1
17 ÿ7 5 ÿ1 2
g
5 6
g
5 7
L
5 6
G
6 7
For k 6 A6
1
ÿ1
1 ~ a6 5
ÿ1
1
16 ÿ8 4
1 1 1
ÿ2 1 ÿ1
1 1 1
a6;6
g
6 7
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
and by degrees we have the solution X6 by computing: {G j
k }7jk⫹1
k 2; …; 6
7 {g
k j }jk⫹1
k 2; …; 6
and consequently we have: 2 3 G 7
6 5 L
6 ; X6 4 7 g
6 7 i.e. the solution of the system A6 X6 b: Important remark: In practice, when we resolve a linear system AX b where there is an upper triangular submatrix of order `, then we can solve
n ⫹ 1 ⫺ ` linear systems and we continue our algorithm for k ` ⫹ 1; …; n. We can verify it with the previous example when ` 5: For our example we have: For k 5 L
5 6
L
5 7
1 0 0
1 1 0
ÿ1 1 1
1 ÿ1 1
ÿ1 1 ÿ1
16 ÿ8 4
17 ÿ7 5
0 0 1
0 0 ÿ1
0 0 1
1 0 ÿ1
1 1 1
ÿ2 1 ÿ1
ÿ1 2 0
We then compute the vectors: 8 2 39n⫹1 < G j` =
` 5 L 4 : j ; g`j j`⫹1 For our example: 2 3 G 6
5
5 5 L6 4 g
5 1 6
2 L
5 7
4
G 7
5 g
5 7 2
3 5
And we continue the algorithm at the step k
` ⫹ 1; …; n:. If ` is large, we win in the number of operations. We notice that the already studied example by the algorithm (A2) is easy to be programmed. 8. Performances of the recursive reduction method 8.1. The performances aspect • The lower part of the matrix A does not change unlike the Gauss method, we can use the previous Remarks 2 and 3 this is very important for programming. • For the resolution with many second members (Section 4.1), particularly if the second members vectors are canonical base vectors {ek }; the Remark 3 is well used. This case corresponds to the inversion of regular matrices
489
where we resolve n linear systems with the vectors of the canonical base {ek } as second members. The recursive reduction method without pivoting requires 2n 3 flops while for the Gauss method, LU factorisation without pivoting requires 2n3 =3 flops [8] in addition to n resolutions of two linear systems 2n 3 flops and it needs more memory place. • Likewise, this case is favourable for calculation with locally disturbed second member. To resolve a linear system having A like matrix and f2 as a new second member: knowing the solution u1 of the system Au1 f1 with f1 苷 f2 ( f1: global loading and f2: the same global loading in addition to a local loading) except for the first ` components then we have: Au1 f1 Au2 f2 A
u1 ⫺ u2 ⫺ f1 ⫺ f2 This case is interesting because we have the first ` components of a second member vector which are equal to zero. The Remark 3 and Section 4.1 are well justified. Given a regular domain V with triangular mesh for example, the stiffness matrix coming from the resolution by finite elements of partial differential equations leads to the resolution of a linear system Ax b: If we want to refine the mesh at proximity a set of points of domain in order to have more information about the solution, our method allows us to exploit the previous calculations (Section 4.1) by adding the new lines and the new second members to the previous stiffness matrix, all advantage is to apply Remark 3 and Section 4.1 that means to have second members vectors having their first ` components equal to zero (with the biggest possible `); this is easy to realise: It is enough that the supports of new nodes will be disjointed from the supports of the first `s. • The refining programming steps are much easier to apply than the Gauss method ones.
k • For our method we only store the pivots pk and the L k⫹1 i.e. the upper triangular part whereas in the Gauss method we store the lower part L and the upper part U of the factorisation LU. Moreover all the transformations are made only on the new second members of the new stiffness matrix, more exactly at the upper triangular part whereas for the Gauss method, the transformations are made at all the new elements of the stiffness matrix. • The lower part of the stiffness matrix A does not change during iterations; like in the case of iterative methods, we will fully exploit Remark 2 in order to avoid multiplying by zero. • For regular problems, the stiffness matrix coming from the resolution by finite elements of partial differential equations is a positive definite matrix: We have the numerical stability without a pivoting strategy Remark 1.
490
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491
• The algorithm (A2) is slightly more parallelisable than the Gauss algorithm. • In optimal shape design [11,12] we consider the formal linear problem A
uYu B
u which can represent partial differential equations or integral equation. The successive derivatives of Yu are obtained by resolving the linear systems of the following form: A
uYu
n
n
B
u ⫺
nX ⫺1
which is more rapid. However, the recursive reduction method is well adapted to the case of positive definite matrices.
8.2. Funny aspect The operation number compared with Gauss is exactly the same.
A
n⫺i Yu
i
i0
The method
By calling u the parameters of the structure, Yu the state (elastic displacement, velocity flow, temperature,…). In practice, the veritable unknowns of the problem are the data u: The geometric form, physical properties of the area, …. By locally refining the mesh, we obtain a new augmented stiffness matrix, the second member f1 also changes locally. 2 6 6 6 6 6 6 4
j j j j j j jj
A
===============
f1 : weights 2 3
3 === === 7 7 === 7 7 7 === 7 5
7 7 7 7 7 7 5
6 6 6 6 6 6 4
( and ÿ) 3
2
2n 3n ÿ 5n 6 2n3 3n2 ÿ 5n 6
Gauss Recursive Reduction
(*) 3
2
2n 3n ÿ 5n 6 2n3 3n2 ÿ 5n 6
(/) 2
n n 2 n2 n 2
Unusual example: Let A be the test matrix defined by max
i; j i; j 1; …; n; and b the vector of components aij P bi nj1 max
i; j·j i 1; …; n: The solution of the system AX b is the vector X 僆 Rn of components xj j j 1; …; n. We obtain the following results on a micro Goupil G5 286 (Gpp, Gauss partial pivoting method; RR, recursive reduction method):
===
j ===
In order to resolve the new system, the new columns of the matrix are considered like new second members. Then we continue to apply the algorithm (A2) by using the previous calculations of pivots pk and L
k k⫹1 . Compared calculation time: Let A be the test matrix defined by 8i > > < j si i ⱕ j aij > > : j si i ⬎ j i
n
10
30
50
70
80
90
100
Gpp
11c
200 25c
900 78c
2300 56c
3300 12c
5100 69c
10 1000 80c
RR
11c
100 27c
400 94c
1200 31c
1700 85c
2400 88c
3300 58c
We notice that the recursive reduction method with the first strategy is faster than the Gauss partial pivoting method. It is due to pivot computing pk ⫺1 ᭙ k 2; …; n in the recursive reduction method: all the calculations are made in the integer arithmetic; whereas in the Gauss method all lower triangular part L is formed by fractional numbers.
b: The vector of components bi
n X j ⫹ i i ji
iX ⫺1 2 j1
Acknowledgements
i 1; …; n:
The solution of the system AX b is the vector X 僆 Rn of components xi i; i 1; …; n: We obtain on micro Goupil G5 286 the following result (Gpp, Gauss partial pivoting method; RR, recursive reduction method):
I would like to thank Prof. Mohamed Masmoudi for the enlightening discussions we had together, as well as for some useful hints he suggested to me. References
n
10
30
50
70
80
90
100
Gpp
11c
200 80c
1200 74c
3400 66c
5100 52c
10 1300 27c
10 4000 29c
RR
11c
00
3 46c
00
16 70c
00
46 63c
0 00
1 9 97c
0
00
1 40 19c
0
00
2 17 81c
We observe that the method of recursive reduction is more time consuming than the Gauss method partial pivoting. This difference is due to inner products calculation in the search of pivot and so we can use the second strategy
[1] Gastinel N. Analyse Nume´rique Line´aire. Paris: Hermann, 1966. [2] Jennings A. Matrix computation for engineers and scientists. Chichester, UK: Wiley, 1980. [3] Pan V. How can we speed up matrix computations? SIAM Review 1984;26:393–415. [4] Ciarlet PG. Introduction a` l’Analyse Nume´rique Matricielle et a` l’Optimisation. Paris: Masson, 1980. [5] Golub GH, Van Loan F. Matrix computations, 2nd ed. Baltimore, MD: Johns Hopkins University Press, 1989. [6] Massart J. Cours d’analyse Tome 4. Comple´ment: Analyse
M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491 vectorielle, Calcul matricel, Extremums, Calcul ope´rationnel, Ge´ome´trie analytique dans E3, Paris: Dunod, 1970. [7] Strang G. Introduction to applied mathematics. Cambridge, UK: Wesley/Cambridge University Press, 1986. [8] Schatzman M. Analyse Nume´rique Cours et Exercices pour la licence. Paris: InterEditions, 1991. [9] Cosnard M, Robert Y. Algorithmique paralle`le. Une e´tude de complexite´. Techniques et Science Informatique 1987;6(2):115–25.
491
[10] Benhamadou M. De´veloppement d’outils en Programmation Line´aire et Analyse Nume´rique matricielle, The`se N⬚ 1955, de l’Universite´ Paul Sabatier Toulouse3, France, 1994. [11] Masmoudi M. Calcul des de´rive´es d’ordre supe´rieur en conception de formes, CRAS, 1993. [12] Millo F. Conception Optimale de Structures Rayonnantes. The`se a` l’Universite´ de Nice-Sophia Antipolis, 1991.