A new method to solve large linear systems: the algorithm of “recursive reduction”

A new method to solve large linear systems: the algorithm of “recursive reduction”

Advances in Engineering Software 31 (2000) 481–491 www.elsevier.com/locate/advengsoft A new method to solve large linear systems: the algorithm of “r...

204KB Sizes 0 Downloads 88 Views

Advances in Engineering Software 31 (2000) 481–491 www.elsevier.com/locate/advengsoft

A new method to solve large linear systems: the algorithm of “recursive reduction” M. Benhamadou* Faculte´ des sciences de Sfax, Universite de Sfax, Sfax CP 3038, Tunisia Received 5 January 1998; revised 23 June 1999; accepted 3 December 1999

Abstract Let A 僆 Rn×n be a regular matrix, and b 僆 Rn ; we study a new direct method of the linear system resolution AX ˆ b by an algorithm called: algorithm of “recursive reduction”. 䉷 2000 Elsevier Science Ltd. All rights reserved. Keywords: Large linear systems; Recursive reduction; Numerical analysis

1. Introduction The appearance of finite elements applied to partial differential equations coming from different problems of engineering (mechanical, aeronautical, civil, electrical, nuclear), physics, naval construction, biology, biomechanics etc. leads to the solving of large linear systems, a central problem of scientific computation in numerical analysis. Therefore, the numerical methods were multiplied giving rise to a wide variety of algorithms; the principal criteria of their comparison being the numerical stability, the rapidity in computation of the solution and the memory place. The majority of the direct methods [4,5] like Gauss, Cholesky methods are based on the factorisation of the matrix A into one product LU, where L is a triangular lower matrix, and U a triangular upper matrix. The Householder method is associated with the QU factorisation where Q is an orthogonal matrix and U is an upper triangular matrix. We present here a new direct numerical method which consists in replacing the resolution of a n × n linear system by that of two …n ⫺ 1† × …n ⫺ 1† systems, and by a recursive procedure of reduction in the order of a matrix one unit at each step. This leads to the resolution of systems in R. Precisely, the resolution of a n × n linear system paves the way for the resolution of two …n ⫺ 1† × …n ⫺ 1† systems. This recursive procedure has then an exponent * Fax: ⫹216-4-274-437. E-mail address: [email protected] (M. Benhamadou).

complexity; and by transforming it into an iterated algorithm, we obtain the same complexity 2n3 =3 flops [1, p. 82] of a Gauss method. When using this procedure we do not factorise the matrix in the usual sense, but we resolve the system by a new second member. This costs only n 2 flops. In Section 6, we give one parallelization of algorithm (A 2). In Section 7 we give a numerical example and summarise other similarities with the Gauss method. Finally we give performances of the recursive reduction method and a numerical comparison with the Gauss method. In what follows, we will be interested in the study of the following problem (P): “Let A 僆 Rn×n be a nonsingular matrix and b 僆 Rn ; our aim is to find X 僆 Rn satisfying: AX ˆ b”. (P). Notations: we denote by • ai;j an element in row i and column j of matrix A. • Ak the submatrix of A obtained with elements aij i; j ˆ 1; …; k • aj (resp. a~ j ) the jth column vector, (resp. the jth row vector) of matrix A • ‰XŠk 僆 Rk of component xj j ˆ 1; …; k • b will be an …n ⫹ 1†th column of the augmented matrix obtained by attaching the column b to A matrix Rnׅn⫹1† that will be noted always as A.

2. Algorithm principle The fundamental idea of the algorithm consists in reducing of one unit the size of the problem to be solved

0965-9978/00/$ - see front matter 䉷 2000 Elsevier Science Ltd. All rights reserved. PII: S0965-997 8(99)00064-2

482

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

The pivot is not equal " to zero, # g is defined, ‰XŠn⫺1 ˆ ‰XŠn⫺1 is solution of (P). Ln⫹1 ⫺ gLn and then X ˆ g

and, in partitioning it as follows 32 3 2 3 2 j j 76 7 6 7 6 j 7 6 7 6 6 j ‰an Šnÿ1 7 Anÿ1 76 ‰XŠnÿ1 7 6 ‰an‡1 Šnÿ1 7 6 j 76 7ˆ6 7 6 76 7 6 7 6 j 76 7 6 7 6 j 54 5 4 5 4 j an‡1;n ‰~an Šnÿ1 j an;n g

3. Generalization

‰XŠn⫺1 ; ‰an Šn⫺1 ; ‰an⫹1 Šn⫺1 僆 Rn⫺1 ; An⫺1 僆 R…n⫺1†×…n⫺1† ;

g 僆 R;

an;n 僆 R

The development by block of system AX ˆ b reduces problem (P) to search for ‰XŠn⫺1 and g verifying: ( An⫺1 ‰XŠn⫺1 ⫹ g‰an Šn⫺1 ˆ ‰an⫹1 Šn⫺1 …1† ‰a~n Šn⫺1 ‰XŠn⫺1 ⫹ gan;n ˆ an⫹1;n If det…An⫺1 † 苷 0 then the first Eq. (1) is equivalent to:

The idea of the algorithm (A1) is particularly interesting when Ln and Ln⫹1 are easy to compute, i.e. An⫺1 is easily invertible (diagonal, upper triangular, lower triangular); in this case the problem (P) is solved. Unfortunately, in the general case, the submatrix An⫺1 is not easily invertible, so we suggest to proceed in a recursive manner to reduce the order of An⫺1 by one unit and so on until we get a submatrix A1 of order 1, i.e. to a resolution in R. By means of this procedure of recursive reduction, the solution of problem (P) is then reconstructed progressively starting from the solutions supplied to the order 1. Let us start by giving an example …n ˆ 4†:

…2†

‰XŠn⫺1 ˆ Ln⫹1 ⫺ gLn with: ( Ln ˆ A⫺1 n⫺1 ‰an Šn⫺1 Ln⫹1 ˆ A⫺1 n⫺1 ‰an⫹1 Šn⫺1 By replacing ‰XŠn⫺1 in the second Eq. (1) we obtain:



an⫹1;n ⫺ ‰a~n Šn⫺1 Ln⫹1 an;n ⫺ ‰a~n Šn⫺1 Ln

…3†

under the hypothesis an;n ⫺ ‰a~n Šn⫺1 Ln 苷 0 (see Proposition 1). We therefore obtain the (A1) algorithm: ( Solving : An⫺1 Ln ˆ ‰an Šn⫺1 An⫺1 Ln⫹1 ˆ ‰an⫹1 Šn⫺1 Computing g ˆ an⫹1;n ⫺ ‰a~n Šn⫺1 Ln⫹1 an;n ⫺ ‰a~n Šn⫺1 Ln …A1 † Computing ‰XŠn⫺1 ˆ Ln⫹1 ⫺ gLn The solution of AX ˆ b is given by :

if An⫺1 is nonsingular if an;n ⫺ ‰a~n Šn⫺1 Ln 苷 0 " Xˆ

‰XŠn⫺1

#

g

We call pivot the denominator of g ; thus we obtain the following proposition. If A and An⫺1 are nonsingular, # " then the ‰XŠn⫺1 ; pivot is not equal to zero and the vector X ˆ g given by the algorithm …A1 †; is the unique solution of (P). Proposition 1.

Proof. As A and An⫺1 are nonsingular, using Frœbenius Schur formula [6, p. 85] we have:

Let L…n⫺k† 僆 Rn⫺k be the unknown vector appearing in j the kth step of the decomposition, associated with the ˆ ‰aj Šn⫺k j ˆ n ⫺ k ⫹ 1; …; n ⫹ 1: subsystems An⫺k L…n⫺k† j At the first step we have A4 L…4† 5 ˆ ‰a5 Š4 : Using formulas (2) and (3), the resolution of the four trivial equations in R obtained in the ultimate place of decomposition permits “reconstruction” of the solution in three steps. In this example, we see that, at the step k ˆ 2; the same system A2 L…2† 3 ˆ ‰a3 Š2 appears twice, and that at the step k ˆ 3; the same system A1 L…1† 2 ˆ ‰a2 Š1 appears four times, and ˆ ‰a3 Š1 appears twice. At step k, we the same system A1 L…1† 3 would solve 2 k systems; in fact certain systems among them appear several times, which gives to this method an exponential complexity. The main idea is to replace the recursive algorithm by an iterative one, i.e to solve at each step k …n ⫺ k ⫹ 1† linear systems

det…A† ˆ det…An⫺1 †det…an;n ⫺ ‰a~ n Šn⫺1 A⫺1 n⫺1 ‰an Šn⫺1 †

n⫹1 {Ak L…k† j ˆ ‰aj Šk }jˆk⫹1

and det…an;n ⫺ ‰a~ n Šn⫺1 A⫺1 n⫺1 ‰an Šn⫺1 † 苷 0:

The solutions of these linear systems (3) are useful to us

…4†

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

483

afterwards and we then obtain the following iterative algorithm.

symbolised by the following table:

3.1. Algorithm of “recursive reduction”

‰~ak Škÿ1

L…kÿ1† k a…kÿ1† k;k

 

…A2 †

9n > > > > > > > > > > > = > > > > > > > > > > > ;

jˆk‡1

kˆ2

Let us call …Sn † : An L…n† n⫹1 ˆ ‰an⫹1 Šn ; and reduce the order of the matrices Ak by one unit at each step, we then introduce the intermediate problems: …Sk † : Ak L…k† j ˆ ‰aj Šk k ˆ n ⫺ 1; …; 1 j ˆ …k ⫹ 1†; …; …n ⫹ 1† to arrive at a submatrix A1 of order 1 and solve in R j ˆ 2; …; …n ⫹ 1†

Now we suppose that det…Ak † 苷 0 k ˆ n; …; 1 (this will be justified soon after). We apply the (A2) algorithm and we do as we have already done for the Gauss method, the L…k† j replaces ‰aj Šk : Step 1: (initialization) computation of {L…1† j ˆ with det…A † ˆ a 苷 0: a1;j =a1;1 }jˆn⫹1 1 1;1 jˆ2

…1†   L…1† 2 L3

a1;1 a1;2 a1;3   

L…1†  j

   L…1† n‡1

   a1; j      

a1;n‡1

Step k: From the L…k⫺1† and the kth row of A we construct j the two “components” of L…k† j :

g…k† j ˆ

a…k⫺1† k;j



G …k† k‡1 g…k† k‡1



a…kÿ1† k;n‡1

‰a~ k Šk⫺1 L…k⫺1† j pk

…k⫺1† and G j…k† ˆ Lj…k⫺1† ⫺ g…k† j ˆ k ⫹ 1; …; n ⫹ 1 which is j Lk

 

G …k† j



G …k† n‡1

g…k† j



g…k† n‡1

and so on and so forth until step n where knowing Ln…n⫺1† and …n⫺1† Ln⫹1 ; (A2) algorithm calculates

g…n† n⫹1 ˆ

3.2. Comments

…S1 † : A1 L…1† j ˆ ‰aj Š1 ˆ a1;j

a…kÿ1† k;j

L…kÿ1† n‡1

n‡1

a1;j Initialization of the first row : L…1† j ˆ a1;1 jˆ2 8 …kÿ1† …kÿ1† > Pivot calculation : p :ˆ a ÿ ‰~ a Š k k kÿ1 Lk k;k > > > > > 8 9n‡1 …kÿ1† > > ak;j ÿ ‰~ak Škÿ1 Lj…kÿ1† > > > …k† > > > > > g ˆ > j > > pk <> > > > > > < = …k† …kÿ1† …k† …kÿ1† > > G ˆ L ÿ g L j j j k > > > > > > > > " …k† # > > > > > > > Gj >> > > …k† > > > > L ˆ : ; j : …k† g j



#

In the following, the proposed algorithm will be detailed: 

L…kÿ1† j

…n⫺1† …n⫺1† an;n⫹1 ⫺ ‰a~n Šn⫺1 Ln⫹1 …n⫺1† an;n ⫺ ‰a~ n Šn⫺1 Ln…n⫺1†

…n† …n⫺1† …n⫺1† and G n⫹1 ˆ Ln⫹1 ⫺ g…n† ; which gives n⫹1 Ln 2 3 G …n† 4 n⫹1 5 ˆ L…n† n⫹1 g…n† n⫹1

as the solution of problem (P). 3.3. The pivot computation strategy ⫺ The (A2) algorithm uses pivot pk ˆ a…k⫺1† k;k …k† n⫹1 ‰a~ k Šk⫺1 L…k⫺1† to calculate { g } : We suggest two jˆk⫹1 j k strategies to compute the pivot for improvement of the numerical stability. We start by the choice of one first pivot p1 ˆ supi僆‰1;nŠ 兩ai;1 兩; strictly positive …det…A† 苷 0†: Next we permute pivot line and the first line, given formally by the left multiplication of matrix A by a permutation matrix P (1); we note that A…1† ˆ P…1† A: At the kth step of the first strategy, …k 僆 ‰2; nŠ† the pivot is ⫺ ‰a~l Šk⫺1 L…k⫺1† 兩; let i0 be defined by pk ˆ max`僆‰k;nŠ 兩a…k⫺1† k `;k the subscript of the maximum, we then permute lines k and i0 of matrix A, equivalent to the left multiplication of matrix A by a permutation matrix P…k† : So we put A…k† ˆ P…k† A…k⫺1† : The computation of pk in the first strategy requires at every step the computation of …n ⫺ k ⫹ 1† inner products, and each of them requires …k ⫺ 1† multiplications and …k ⫺ 2† additions, in total we obtain …n ⫺ k ⫹ 1†…2k ⫺ 3†; equivalent of 2kn flops. The second strategy reduces the number of inner 兩 as a reference at products. Choosing rk ˆ supi僆‰k;nŠ 兩a…k⫺1† i;k step k and considering the set ⫺ ‰a~` Šk⫺1 L…k⫺1† 兩 ⱖ rk } Ek ˆ {` 僆 ‰k; nŠ; 兩a…k⫺1† k `;k When Ek 苷 f; let `k be the subscript associated to the minimum, the pk is defined by: 8 …k⫺1† …k⫺1† > if Ek 苷 f < 兩a`k ;`k ⫺ ‰a~ `k Šk⫺1 Lk 兩 pk ˆ …k⫺1† …k⫺1† > : max 兩a`;k ⫺ ‰a~ ` Šk⫺1 Lk 兩 otherwise `僆‰k;nŠ

484

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

Thus, the algorithm using the second strategy is by far more rapid, but with an eventual loss of stability. Remark 1. We notice that for positive definite matrices, we can solve problem (P) without using either of these strategies: the numerical stability is assured as for Cholesky’s method. Theorem 1. If det…A† 苷 0; there exists a rows permutation matrix such that: 8 > for k ˆ 1 sup 兩a…1† 兩 ⬎ 0 > < i僆‰1;nŠ i;1 …k⫺1† …k⫺1† > > Šk⫺1 L…k⫺1† 兩⬎0 : sup 兩ai;k ⫺ ‰a~i k

᭙k 僆 ‰2; nŠ

i僆‰k;nŠ

Proof. In the first step k ˆ 1; at least one of the elements ai;1 of the first column of A matrix is different from zero since det…A† 苷 0: We choose then the biggest of the nonzero coefficients of the first column of A as the first pivot. Next we permute the line of pivot by the first line, and we have then A…1† ˆ P…1† A: Thus A…1† 1 is nonsingular. We execute the algorithm until step k k 僆 ‰2; nŠ; (A…k⫺1† …k⫺1† is nonsingular). We show that if sup`僆‰k;nŠ 兩a…k⫺1† ⫺ `;k

Šk⫺1 L…k⫺1† 兩 ˆ 0 then det…A† ˆ ‰a~…k⫺1† k ` …k⫺1† Ak …`†; ` 僆 ‰k; nŠ the matrix of order by replacing the last submatrix A…k⫺1† k Š : ‰a~…k⫺1† k ` 2 6 6 6 6 …kÿ1† Ak …l† ˆ 6 6 6 6 4

A…kÿ1† kÿ1

‰~a…kÿ1† Škÿ1 l

0: We note by k obtained from the line by the vector

3 j j 7 j j ‰a…kÿ1† Škÿ1 7 7 k 7 j 7 j 7 j 7 j 7 5 j a…kÿ1† l;k

By applying the Frœbenius Schur formula, we obtain: …k⫺1† …`†† ˆ det…A…k⫺1† det…A…k⫺1† k k⫺1 †det‰a`;k ⫺1 …k⫺1† ⫺ ‰a~ …k⫺1† Šk⫺1 …A…k⫺1† Šk⫺1 Š k⫺1 † ‰ak `

By using (4) …k⫺1† …`†† ˆ det…A…k⫺1† ⫺ ‰a~ …k⫺1† Šk⫺1 L…k⫺1† Š det…A…k⫺1† k k⫺1 †det‰a`;k k `

⫺ ‰a~…k⫺1† Šk⫺1 L…k⫺1† 兩 ˆ 0 this implies: If sup`僆‰k;nŠ 兩a…k⫺1† k `;k ` …k⫺1† …k⫺1† detAk…k⫺1† …`† ˆ det…Ak⫺1 †det‰a`;` ⫺ ‰a~`…k⫺1† Šk⫺1 Lk…k⫺1† Š ˆ 0

᭙` 僆 ‰k; nŠ …`† is a linear combinaThe kth line of the matrix A…k⫺1† k tion of the others, which implies that the determinant of A is equal to the determinant of the following matrix obtained by

replacing the lines which gives zero. A…kÿ1† kÿ1 det…A† ˆ C

k ⫹ 1 to n by the linear combination j j j j B j j j j j G

where C is a nil matrix and where G has its first colomn nil. By applying Frœbenius Schur formula we obtain: …k⫺1† ⫺1 det…A† ˆ det…A…k⫺1† k⫺1 †det‰G ⫺ C…Ak⫺1 † BŠ

As the matrix C is nil, so det…A† ˆ det…A…k⫺1† k⫺1 †det…G† and as the matrix G has its first column nil, then det…G† ˆ 0 and then we obtain: det…A† ˆ 0: A Proposition 2. Let A 僆 Rn×n ; b 僆 Rn : By applying …A2 † algorithm with pivot strategy until step k, then …1† detA…k† k ˆ ea1;1

k Y

…i⫺1† ~…i† ‰a…i† Š i;i ⫺ ‰a i Ši⫺1 Li

e ˆ ^1

iˆ2

Proof. The demonstration is based on the following recurrence relation: …k⫺1† † ˆ det…A…k⫺1† ⫺ ‰a~…k⫺1† Šk⫺1 L…k⫺1† Š det…A…k⫺1† k k⫺1 †det‰ak;k k k

A …i† …i⫺1† ~ …i† we Note. As p1 ˆ a…1† i Ši⫺1 Li 1;1 and pi ˆ ai;i ⫺ ‰a obtain: …1† detA…k† k ˆ ea1;1

k Y

…i⫺1† ~ …i† ‰a…i† Šˆe i;i ⫺ ‰a i Ši⫺1 Li

iˆ2

k Y

Particularly with k ˆ n; we obtain: det…A† ˆ e …i⫺1† Š; and after that ‰a~…i† i Ši⫺1 Li det…A† ˆ e

n Y

pi

pi

iˆ1

Qn

e ˆ ^1

iˆ1

iˆ1

‰a…i† i;i ⫺

…5†

We obtain Theorem 2. Let A 僆 Rn×n be a nonsingular matrix, and b 僆 Rn : There exists a permutation matrix P such that the vector L…n† n⫹1 given by the algorithm …A2 † is the solution of PAx ˆ Pb: Proof. It suffices to apply the algorithm (A2) with pivoting strategy to the augmented matrix A bordered by the second member b. 4. Computation of the number of elementary operations We study the complexity of the steps of (A2)

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

(a) Initialization: ‰…n ⫹ 1† ⫺ 2Š ⫹ 1 ˆ n divisions (b) Step k: ⫺ ‰a~ k Šk⫺1 L…k⫺1† ; k⫺1 ◦ Computation of pk ˆ a…k⫺1† k;k k multiplications and …k ⫺ 1† additions/subtractions …k⫺1† ⫺ ‰a~k Šk⫺1 L…k⫺1† †=pk j ˆ ◦ Computation of g…k† j ˆ …ak;j j k ⫹ 1; …; n ⫹ 1: For each of them …k ⫺ 1† multiplications, and …k ⫺ 1† additions/subtractions, one division, in total …n ⫹ k ⫺ 1† divisions, …k ⫺ 1†…n ⫹ k ⫺ 1† multiplications, and additions/subtractions ⫺ gj…k⫺1† L…k⫺1† jˆ ◦ Computation of G j…k† ˆ L…k⫺1† j k k ⫹ 1; …; n ⫹ 1: For fixed j we affect …k ⫺ 1† multiplications and …k ⫺ 1† subtractions, in total: …n ⫺ k ⫹ 1†…k ⫺ 1† multiplications and additions/subtractions. The algorithm then requires in total: P n…n ⫹ 1† ◦ Divisions: n ⫹ nkˆ2 …n ⫹ 1 ⫺ k† ˆ 2 ◦ Additions/subtractions and multiplications: n X kˆ2

n…n ⫺ 1† n…n ⫺ 1†…n ⫹ 1† 2n ⫹ ⬃ flops 2 3 3 3

The number of operations of the Gauss method and that of “recursive reduction” algorithm are summarized in the following table: The method Gauss Recursive Reduction

member. We can see the Turbo-Pascal program in Ref. [10, p. 123], where we use an adequate numerotation function g [2, p. 95].   …n ⫹ 1†…n ⫹ 2† g : ‰1; n ⫹ 1Š × ‰1; n ⫹ 1Š ! 1; 2 …i; j†

! g…i; j† ˆ

j…j ⫺ 1† ⫹i 2

4.1. Resolution with many second members To solve the linear system AX ˆ b; by the Gauss method we can proceed in two stages. At the first stage we compute the factorisation A ˆ LU that is stored in memory. At the second stage, we solve successively both systems ( Ly ˆ b Ux ˆ y

…k ⫺ 1† ⫹ 2…k ⫺ 1†…n ⫺ k†

ˆ

485

(‡ and ÿ)

(*)

(/)

2n3 ‡ 3n2 ÿ 5n 6 2n3 ‡ 3n2 ÿ 5n 6

2n3 ‡ 3n2 ÿ 5n 6 2n3 ‡ 3n2 ÿ 5n 6

n2 ‡ n 2 n2 ‡ n 2

If we have to solve two systems having the same matrix, but different second member, the first stage of factorisation is carried out only once and allows to solve two systems. In our case it suffices to store at the time of the first resolution L…k† k⫹1 and pk k ˆ 2; …; n then to compute: 8 a1;n‡1 > L…1† > n‡1 ˆ a1;1 > > > > 9n >8 > a…kÿ1† ak Škÿ1 L…kÿ1† > > …k† n‡1 > k;n‡1 ÿ ‰~ > > > > gn‡1 ˆ > > > pk <> > > > > > > < = …k† …kÿ1† …k† …kÿ1† > > gn‡1 ˆ Ln‡1 ÿ gn‡1 Lk > > > > > > > > " # > > > > > …k† > > > > g > > …k† > n‡1 > > > and L ˆ : ; …k† : n‡1 g n‡1

There is the same number of operations in both methods: ⯝ 2n3 =3 flops: Nevertheless it seems that the exponent 3 is not far to be optimal [3]. By taking a closer look we make the following remark. Remark 2. In the algorithm (A2), the most expensive is the computation of g…k† j and more precisely the computation : These inner products cause of inner products ‰a~ k Šk⫺1 L…k⫺1† j the intervention of ‰a~k Šk⫺1 ; kth row of the matrix A having …k ⫺ 1† elements that do not change in current iterations. Here we see the importance of the structure of the lower part of the matrix A. If these {a~ k }nkˆ2 are sparse (for example: A Hessenberg, band, sparse) then the inner products are going to have fewer oprations, which is interesting on the numerical computation. We observe that there is a fixed part which is the lower part of the matrix A and another part that changes during the iterations. This is the upper part of the matrix A hence the idea of unifying both structures, one more for the fixed part of the matrix A (i.e we do not store the zeros) and a triangular column by column structure, augmented by the second

kˆ2

the supplementary cost is of: • n divisions • Twice n…n ⫺ 1†=2 multiplications and additions/ subtractions in total ⯝ 2n2 flops; as in a second use of the Gauss classical method. 5. Block matrix inverse Let M 僆 R…p⫹q†×…p⫹q† be a larger regular matrix, and suppose M is partitioned as follows: 0 p pB A j Mˆ B @ ÐÐ j q C j A [ Rp£p ;

q 1 B C C ÐÐ A D

D [ Rq£q

p1qˆn

Problem: Compute M ⫺1. We can use the Gauss Jordan practical disposition as

486

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

under the hypothesis det…A† 苷 O and det…D† 苷 O; we obtain the simplified formula:

follows: A

B

Ip

O

C

D

O

Iq

0

Aÿ1 B

M ÿ1

Aÿ1

O

Step 2: we insert a new block line as follows:

C

Aÿ1 B

Aÿ1

O

D

O

Iq

and we calculate successively: • the pivot P ˆ D ⫺ CA⫺1 B that we suppose such that det…P† 苷 O ⫺1 ⫺1 ⫺1 僆 Rq×p ; g…2† 僆 Rq×p • g…2† 3 ˆ ⫺P CA 4 ˆP …2† …2† …2† ⫺1 ⫺1 ⫺1 • G 3 ˆ A ⫺ A Bg3 ; G 4 ˆ ⫺A Bg…2† 4 Aÿ1 B

Aÿ1 ÿ Aÿ1 Bg…2† 3

ÿAÿ1 Bg…2† 4

D

g…2† 3

g…2† 4

C

Hence we easily obtain by algorithm (A2), the same formula that [1, p. 22]: p

0 ÿ1

M ÿ1 ˆ

ÿ1

q Bg…2† 3

pBA ÿ A B @ ÐÐÐÐÐÐÐÐÐÐ q ÿP ÿ1 CAÿ1

j j j

ÿ1

1

Bg…2† 4

ÿA C C ÐÐÐÐÐÐÐ A ÿ1 P

…6†

5.1. Computation of inverse of some special matrix Let M 僆 R…p⫹q†×…p⫹q† be a larger regular matrix and suppose M is partitioned as follows: 0 p pB A j Mˆ B @ ÐÐ j q C j A [ Rp£p ;

q

ÿ1

by applying block algorithm (A2) with two block second members, we obtain: Step 1: initialisation we suppose det…A† 苷 O A

p

q 1 B C C ÐÐ A D

where C 僆 Rq×p and p ⫹ q ˆ n: Suppose that C ˆ O (or similarly B ˆ O) in Eq. (6). Then

j j j

1 ÿA BDÿ1 C C ÐÐÐÐÐÐÐ A D ÿ1

…7†

Application to the revised simplex method: We recall that the revised simplex method [7, p. 681] solves these three equations (where A is partitioned as A ˆ ‰B; NŠ): 8 BxB ˆ b gives the nonzero part of x > > > > t > zB ˆ t cB gives z and the reduced cost r ˆ t cN ⫺ t zN > > < Bv ˆ y gives v and then the ratios y > > > > …is the entering column of N so that v > > > : is the column of B⫺1 N† …8† and never computes B⫺1 N. What it needs is current B ⫺1. That raises a natural question in matrix algebra. If a column leaves B and is replaced by a column of N, what is the change on B ⫺1? The step starts whith a basis matrix Bold and ends whith a basis matrix Bnew. One column is changed—say the last column of Bold is replaced by the entering column y taken from N. The vector v mentioned ⫺1 above is B⫺1 old y: We claim that this vector v leads to Bnew if we multiply by the right matrix: 1ÿ1 0 1 v1 C B .. C B C B . B .. C C Bÿ1 B Bÿ1 new ˆ B old 1 . C C B C BÐ Ð Ð ÐA @ vm 0 ... 0 With Eq. (7) we 0 1 B .. B B . B B MˆB B BÐ Ð @ 0 ...

can calculate the inverse matrix M: 1 v1 C C C .. C C; 1 . C C ÐC Ð A vm 0

0

D [ Rq£q

where C [ Rq£p and p 1 q ˆ n

pB A ˆ B @ÐÐÐÐÐ q O

ÿ1

M

ÿ1

B1 B B B B B ˆB B B B BÐ B @ 0

..

. 1

Ð

Ð

...

0

1 v1 ÿ v C .. m C C . C C C v ÿ mÿ1 C vm C C C Ð C C 1 A vm

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

Before moving to programming of the method, we make the following remarks. Remark 3. 1. The method that we have just seen was recursive; with the algorithm (A2), it has become iterative, we can exploit its recursivity to control at each step the solution of the system. Indeed if we are at the kth step and we have n⫹1 computed {L…k† j }jˆk⫹1 then we can verify if we have exactly expression (4). 2. If we denote by a`p the pth column vector of the matrix A where ` indicates the index of first term nonzero, i.e. the vector ap has its first …` ⫺ 1† nil components and the `th ˆ ak` nonzero then for all step k ⬍ `; the system Ak L…k† ` has for solution the nil vector. We take into consideration this property in the programming for sparse matrices in their upper triangular part, by considering a vector of the order …n ⫹ 1† called Indice as follows:

1

p

Indice :

l

3. The method of recursive reduction, like the Gauss method, permits to compute the determinant of matrix A with the formula (5). It also permits to verify if the matrix A is positive definite. That is why it suffices to take the permutation matrices P…k† equal to identity and to verify if a1;1 and pk ˆ a…k⫺1† ⫺ ‰a~k Šk⫺1 L…k⫺1† k;k k are all strictly positive for k ˆ 2; …; n: We can with compute the determinants of submatrices A…k† k the formula (5). 6. Parallelisation algorithm A2 of recursive reduction We study the parallelisation of the algorithm A2 to solve a linear system AX ˆ b; we consider A as being of size n × …n ⫹ 1† and we use the formalism of algorithm parallelisation developed in Ref. [9]. We give the following main notations: EX(f ) is an integer representing the time of the execution of the task T. p : Relation of partial order on the set of tasks Ti p Tk i 苷 k signifies that the execution of the task Ti must be finished before the beginning of Tk execution. H…G† is by definition the height of G graph: it is the number of tasks of the longest way of G. L…G† is by definition the width of graph G: it is the minimum of the widths of G decompositions.

487

Let us define the following tasks: Initialization: Execute T11 具L…1† j ˆ a1;j =a11 典 j ˆ 2; …; n ⫹ 1 For k ˆ 2 to n 典 Execute Tkk 具pk ˆ ak;k ⫺ ‰a~ k Šk⫺1 L…k⫺1† k For j ˆ k ⫹ 1 to n ⫹ 1 ~k Šk⫺1 L…k⫺1† =pk 典 Execute Tkj 具g…k† j ˆ ak;j ⫺ ‰a j …k† …k⫺1† …k⫺1† ⫺ g…k† L 典 Execute Ukj 具G j ˆ Lj j k We neglect the step of initialisation because it does not intervene in the computation of {Tkk ; Tk;k⫹1 ; Uk;k⫹1 } k ˆ 2; …; n. Ex …Tkk † ˆ 2…k ⫺ 1† flops k ˆ 2; …; n Ex …Tkj † ˆ 2…n ⫺ k ⫹ 1†…k ⫺ 1† flops k ˆ 2; …; n Ex …Ukj † ˆ 2…n ⫺ k ⫹ 1†…k ⫺ 1† flops k ˆ 2; …; n The total number of operations is equal to ⯝ 2n3 =3 The constraints of precedences are: j ˆ k ⫹ 1; …; n ⫹ 1; k ˆ 1; …; n Tkk p Tkj j ˆ k ⫹ 1; …; n ⫹ 1; k ˆ 2; …; n: Tkj p Ukj



n

n‡1

6.1. Theoretical analysis of the graph Hypothesis: EX…Tkk † ˆ EX…Tk;k⫹1 † ˆ EX…Uk;k⫹1 † ˆ btk where b is an integer and tk ˆ k a time unit k ˆ 2; …; n: To determine the optimal time Topt, we must find the longest graph path, it is not difficult to see that there exist …n ⫺ 1† those concerning the tasks {T2;j }n⫹1 jˆ3 . The longest graph path is formed by the tasks {Tkk ; Tk;k⫹1 ; Uk;k⫹1 } k ˆ 2; …; n: Its length is equal to time Topt which is inferior bound of time taken for the parallel execution of all algorithm, we obtain: Topt ˆ

n X

2b…k ⫺ 1† ⫹ 2b…k ⫺ 1† ⫹ 2b…k ⫺ 1†

kˆ1

Topt ˆ 6b

n X kˆ2

…k ⫺ 1† ˆ

6b…n ⫺ 1†n ˆ 3bn2 ⫹ o…n† 2

We have a graph where the number of tasks by level decreases. We remark also that the precedence of graph as we have represented, corresponds to the decomposition by the predecessor. In fact, all the tasks of the same level have the same execution time, the width of all compositions being equal to …n ⫺ 1†; it is evident to construct, by using popt ⱕ …n ⫺ 1† processors, an algorithm in Topt.

488

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

For k ˆ 1

6.2. Graph of tasks

L…1† 2

A1

Example for n ˆ 5 T2;2 & T2;3 T2;4 T2;5 T2;6 # # # # U2;3 U2;4 U2;5 U2;6 # T3;3 & T3;4 T3;5 T3;6 # # # U3;4 U3;5 U3;6 # T4;4 & T4;5 T4;6 # # U4;5 U4;6 # T5;5 & T5;6 # U5;6

1

level 1

L…1† 2

A2 0 ~ ‰a2 Š1

level 2

B B0 B B B0 B AˆB B B0 B B B0 @

⫺1

1

1

1 ⫺1

0

1

1

0

0

1

0

0

0

1 ⫺1

1 ⫺1

0 1 1 B C B1C B C B C B1C B C C solution : X6 ˆ B B C B1C B C B C B1C @ A 1

1 1 a2;2

ÿ1

1

G …2† 3

G …2† 4

ÿ2 1

g…2† 3

L…2† 3

A3

level 3

0 ~ ‰a3 Š2 level 4

L…1† 6

L…1† 7

ÿ1

1

2

G …2† 5

G …2† 6

2 ÿ1

g…2† 4

ÿ2 1

g…2† 5

2 ÿ1

g…2† 6

G …2† 7 1 1

g…2† 7

0

ÿ2 1 1 a3;3

A4

⫺1

1

1

C 1 ⫺1 C C C ⫺1 1C C C C 1 ⫺1 C C C 1 1C A 1 ⫺1

G …3† 4 4 ÿ2 1

G …3† 5 ÿ4 2 ÿ1

G …3† 6 4 ÿ2 1

G …3† 7 5 ÿ1 2

g…3† 4

g…3† 5

g…3† 6

g…3† 7

L…3† 4

G …4† 5

G …4† 6

G …4† 7

For k ˆ 4

We use the algorithm (A2) without pivoting strategy to solve the linear system A6 X6 ˆ b A 僆 R6×6 ; b 僆 R6 :

1

L…1† 5

For k ˆ 3

7. Numerical example

1

1

L…1† 4

For k ˆ 2

0

0

L…1† 3

0

a4;4

ÿ8 4 ÿ2 1

8 ÿ4 2 ÿ1

1 1 1 1

g…4† 5

g…4† 6

g…4† 7

L…4† 5

G …5† 6

G …5† 7

For k ˆ 5 A5

0 1 2 B C B1C B C B C B2C B C C bˆB B C B1C B C B C B2C @ A 0

0 ~ ‰ a4 Š 3

4 ÿ2 1 1

0

0

0 ‰~ a5 Š4

0

ÿ8 4 ÿ2 1 1 a5;5

16 ÿ8 4 ÿ2 1

17 ÿ7 5 ÿ1 2

g…5† 6

g…5† 7

L…5† 6

G …6† 7

For k ˆ 6 A6

1

ÿ1

1 ‰~ a6 Š5

ÿ1

1

16 ÿ8 4

1 1 1

ÿ2 1 ÿ1

1 1 1

a6;6

g…6† 7

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

and by degrees we have the solution X6 by computing: {G j…k† }7jˆk⫹1

k ˆ 2; …; 6

7 {g…k† j }jˆk⫹1

k ˆ 2; …; 6

and consequently we have: 2 3 G 7…6† 5 ˆ L…6† ; X6 ˆ 4 7 g…6† 7 i.e. the solution of the system A6 X6 ˆ b: Important remark: In practice, when we resolve a linear system AX ˆ b where there is an upper triangular submatrix of order `, then we can solve …n ⫹ 1 ⫺ `† linear systems and we continue our algorithm for k ˆ ` ⫹ 1; …; n. We can verify it with the previous example when ` ˆ 5: For our example we have: For k ˆ 5 L…5† 6

L…5† 7

1 0 0

1 1 0

ÿ1 1 1

1 ÿ1 1

ÿ1 1 ÿ1

16 ÿ8 4

17 ÿ7 5

0 0 1

0 0 ÿ1

0 0 1

1 0 ÿ1

1 1 1

ÿ2 1 ÿ1

ÿ1 2 0

We then compute the vectors: 8 2 39n⫹1 < G j` = …`† 5 L ˆ4 : j ; g`j jˆ`⫹1 For our example: 2 3 G 6…5† …5† 5 L6 ˆ 4 g…5† ˆ 1 6

2 L…5† 7

ˆ4

G 7…5† g…5† 7 ˆ 2

3 5

And we continue the algorithm at the step k ˆ …` ⫹ 1†; …; n:. If ` is large, we win in the number of operations. We notice that the already studied example by the algorithm (A2) is easy to be programmed. 8. Performances of the recursive reduction method 8.1. The performances aspect • The lower part of the matrix A does not change unlike the Gauss method, we can use the previous Remarks 2 and 3 this is very important for programming. • For the resolution with many second members (Section 4.1), particularly if the second members vectors are canonical base vectors {ek }; the Remark 3 is well used. This case corresponds to the inversion of regular matrices

489

where we resolve n linear systems with the vectors of the canonical base {ek } as second members. The recursive reduction method without pivoting requires 2n 3 flops while for the Gauss method, LU factorisation without pivoting requires 2n3 =3 flops [8] in addition to n resolutions of two linear systems 2n 3 flops and it needs more memory place. • Likewise, this case is favourable for calculation with locally disturbed second member. To resolve a linear system having A like matrix and f2 as a new second member: knowing the solution u1 of the system Au1 ˆ f1 with f1 苷 f2 ( f1: global loading and f2: the same global loading in addition to a local loading) except for the first ` components then we have: Au1 ˆ f1 Au2 ˆ f2 A…u1 ⫺ u2 †⫺ ˆ f1 ⫺ f2 This case is interesting because we have the first ` components of a second member vector which are equal to zero. The Remark 3 and Section 4.1 are well justified. Given a regular domain V with triangular mesh for example, the stiffness matrix coming from the resolution by finite elements of partial differential equations leads to the resolution of a linear system Ax ˆ b: If we want to refine the mesh at proximity a set of points of domain in order to have more information about the solution, our method allows us to exploit the previous calculations (Section 4.1) by adding the new lines and the new second members to the previous stiffness matrix, all advantage is to apply Remark 3 and Section 4.1 that means to have second members vectors having their first ` components equal to zero (with the biggest possible `); this is easy to realise: It is enough that the supports of new nodes will be disjointed from the supports of the first `s. • The refining programming steps are much easier to apply than the Gauss method ones. …k† • For our method we only store the pivots pk and the L k⫹1 i.e. the upper triangular part whereas in the Gauss method we store the lower part L and the upper part U of the factorisation LU. Moreover all the transformations are made only on the new second members of the new stiffness matrix, more exactly at the upper triangular part whereas for the Gauss method, the transformations are made at all the new elements of the stiffness matrix. • The lower part of the stiffness matrix A does not change during iterations; like in the case of iterative methods, we will fully exploit Remark 2 in order to avoid multiplying by zero. • For regular problems, the stiffness matrix coming from the resolution by finite elements of partial differential equations is a positive definite matrix: We have the numerical stability without a pivoting strategy Remark 1.

490

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491

• The algorithm (A2) is slightly more parallelisable than the Gauss algorithm. • In optimal shape design [11,12] we consider the formal linear problem A…u†Yu ˆ B…u† which can represent partial differential equations or integral equation. The successive derivatives of Yu are obtained by resolving the linear systems of the following form: A…u†Yu…n†

…n†

ˆ B …u† ⫺

nX ⫺1

which is more rapid. However, the recursive reduction method is well adapted to the case of positive definite matrices.

8.2. Funny aspect The operation number compared with Gauss is exactly the same.

A…n⫺i† Yu…i†

iˆ0

The method

By calling u the parameters of the structure, Yu the state (elastic displacement, velocity flow, temperature,…). In practice, the veritable unknowns of the problem are the data u: The geometric form, physical properties of the area, …. By locally refining the mesh, we obtain a new augmented stiffness matrix, the second member f1 also changes locally. 2 6 6 6 6 6 6 4

j j j j j j jj

A

===============

f1 : weights 2 3

3 === === 7 7 === 7 7 7 === 7 5

7 7 7 7 7 7 5

6 6 6 6 6 6 4

(‡ and ÿ) 3

2

2n ‡ 3n ÿ 5n 6 2n3 ‡ 3n2 ÿ 5n 6

Gauss Recursive Reduction

(*) 3

2

2n ‡ 3n ÿ 5n 6 2n3 ‡ 3n2 ÿ 5n 6

(/) 2

n ‡n 2 n2 ‡ n 2

Unusual example: Let A be the test matrix defined by max…i; j† i; j ˆ 1; …; n; and b the vector of components aij ˆ P bi ˆ njˆ1 max…i; j†·j i ˆ 1; …; n: The solution of the system AX ˆ b is the vector X 僆 Rn of components xj ˆ j j ˆ 1; …; n. We obtain the following results on a micro Goupil G5 286 (Gpp, Gauss partial pivoting method; RR, recursive reduction method):

===

j ===

In order to resolve the new system, the new columns of the matrix are considered like new second members. Then we continue to apply the algorithm (A2) by using the previous calculations of pivots pk and L…k† k⫹1 . Compared calculation time: Let A be the test matrix defined by 8i > > < j si i ⱕ j aij ˆ > > : j si i ⬎ j i

n

10

30

50

70

80

90

100

Gpp

11c

200 25c

900 78c

2300 56c

3300 12c

5100 69c

10 1000 80c

RR

11c

100 27c

400 94c

1200 31c

1700 85c

2400 88c

3300 58c

We notice that the recursive reduction method with the first strategy is faster than the Gauss partial pivoting method. It is due to pivot computing pk ˆ ⫺1 ᭙ k ˆ 2; …; n in the recursive reduction method: all the calculations are made in the integer arithmetic; whereas in the Gauss method all lower triangular part L is formed by fractional numbers.

b: The vector of components bi ˆ

n X j ⫹ i i jˆi

iX ⫺1 2 jˆ1

Acknowledgements

i ˆ 1; …; n:

The solution of the system AX ˆ b is the vector X 僆 Rn of components xi ˆ i; i ˆ 1; …; n: We obtain on micro Goupil G5 286 the following result (Gpp, Gauss partial pivoting method; RR, recursive reduction method):

I would like to thank Prof. Mohamed Masmoudi for the enlightening discussions we had together, as well as for some useful hints he suggested to me. References

n

10

30

50

70

80

90

100

Gpp

11c

200 80c

1200 74c

3400 66c

5100 52c

10 1300 27c

10 4000 29c

RR

11c

00

3 46c

00

16 70c

00

46 63c

0 00

1 9 97c

0

00

1 40 19c

0

00

2 17 81c

We observe that the method of recursive reduction is more time consuming than the Gauss method partial pivoting. This difference is due to inner products calculation in the search of pivot and so we can use the second strategy

[1] Gastinel N. Analyse Nume´rique Line´aire. Paris: Hermann, 1966. [2] Jennings A. Matrix computation for engineers and scientists. Chichester, UK: Wiley, 1980. [3] Pan V. How can we speed up matrix computations? SIAM Review 1984;26:393–415. [4] Ciarlet PG. Introduction a` l’Analyse Nume´rique Matricielle et a` l’Optimisation. Paris: Masson, 1980. [5] Golub GH, Van Loan F. Matrix computations, 2nd ed. Baltimore, MD: Johns Hopkins University Press, 1989. [6] Massart J. Cours d’analyse Tome 4. Comple´ment: Analyse

M. Benhamadou / Advances in Engineering Software 31 (2000) 481–491 vectorielle, Calcul matricel, Extremums, Calcul ope´rationnel, Ge´ome´trie analytique dans E3, Paris: Dunod, 1970. [7] Strang G. Introduction to applied mathematics. Cambridge, UK: Wesley/Cambridge University Press, 1986. [8] Schatzman M. Analyse Nume´rique Cours et Exercices pour la licence. Paris: InterEditions, 1991. [9] Cosnard M, Robert Y. Algorithmique paralle`le. Une e´tude de complexite´. Techniques et Science Informatique 1987;6(2):115–25.

491

[10] Benhamadou M. De´veloppement d’outils en Programmation Line´aire et Analyse Nume´rique matricielle, The`se N⬚ 1955, de l’Universite´ Paul Sabatier Toulouse3, France, 1994. [11] Masmoudi M. Calcul des de´rive´es d’ordre supe´rieur en conception de formes, CRAS, 1993. [12] Millo F. Conception Optimale de Structures Rayonnantes. The`se a` l’Universite´ de Nice-Sophia Antipolis, 1991.