Efficient large scaled profile matrix computation programs

Pergamon 0045-7949(94)00370-x EFFICIENT Compurm & Srructurps Vol. 54. No. 4. pp. 731-742. 1995 Copyright (’ 1995 Elsevier Science Ltd Prmted ,n Gre...

Download PDF

813KB Sizes 2 Downloads 83 Views

Report

PDF Reader
Full Text

Pergamon

0045-7949(94)00370-x

EFFICIENT

Compurm & Srructurps Vol. 54. No. 4. pp. 731-742. 1995 Copyright (’ 1995 Elsevier Science Ltd Prmted ,n Great Bntam All rights reserved 0045.7949195 $9.50 + 0.00

LARGE SCALED PROFILE COMPUTATION PROGRAMS

MATRIX

M. S. Park Department

of Mechanical

Engineering,

373-l Kusong-dong,

Korea Advanced Institute of Science and Technology, Yusong-ku, Taejon, 305-701, Korea

(Received 12 July 1993) Abstract--Several efficient programs written in C that handle large scaled matrices, which are usually stored out of the core memory, are provided. A compacted sky-line matrix storage scheme is used for the symmetric coefficient matrices and Grout elimination is used for the solution of linear systems without core blocks. Scalar machine matrix multiplication, canonical multiplication and solutions of nonsymmetric augmented linear system routines are provided, which are easily modified to the vectorized machines. Numerical tests are done by personal computers and engineering workstations. Efficiency of storage and time is considered good, compared to the full in-core routines.

INTRODUCTION

whose size is relatively small compared to its matrix. When the block sky-line method is used, the memory size for the whole matrix storage is smaller than that of the frontal method but the size of core blocks and its accessing time is considerably larger [4]. In this study, the sky-line storage scheme, which is easy to implement on the vectorized machines, is adopted and computation algorithms that use small buffers instead of core blocks are implemented. Routines for multiplication, canonical multiplication, Crout variation of Gaussian elimination and solution of augmented non-symmetric system equations are coded in C and their numerical tests are undertaken by personal computers and engineering workstations. Their performances are also discussed.

Large amounts of memory are required for the analysis of structures and design sensitivities of the structures by finite element method. Almost all of this memory is to store the stiffness matrix, the mass matrix or their sensitivities to the design parameters. These matrices are usually symmetric and have a narrow band near the principal diagonal, so the efficient methods for storing ‘at least’ required coefficients are used. Nevertheless, cases when it is not sufficient are very common in practice. For example, a shell structure having 20,000 dofs needs 160 MBytes of memory (20,000*1000*8 Bytes), though the band matrix with its semi-bandwidth of 1000 is used. Similar amounts of in-core memory are rarely available for normal machines and the out-of-core remedy is necessary. The computation time by using such large scaled matrices is even more. Most of those matrix computations are categorized by multiplications and solution of linear systems of equations. In eigenvalue analysis, for example, such as natural frequency analysis or structural buckling analysis, matrix multiplication and solution of linear systems are required many times (or iteratively). For the band symmetric matrix of n x b, the number of scalar multiplications is nb for one matrix by vector multiplication, nb*/2 for triangularization of the matrix and nb for back substitution of a triangular matrix [ 11. If the matrices reside out of the core, most time is spent by I/O rather than CPU. So the program should handle I/O carefully and attain a minimum access time. The frontal method [2] and the block sky-line method [3] exist for the solution of linear system equations for the out-of-core matrix. Both of them use direct elimination and require ‘core blocks’

Compacted

matrix

storage

scheme

Assume the symmetric coefficient matrix A has the ‘profile’ or ‘sky-line’ shown in Fig. 1. Where elements of the matrix A which are located outside of the dashed line envelope have zero values, those elements are unused during any matrix calculation, i.e. multiplication, factorization and substitution. Also, the envelope or profile remains the same under matrix operations. So we store the matrix in a compact one-dimensional array, which contains the elements that are located inside the envelope and the lower triangular portion by rows. {a} = {A,,, A,,, A,,, A,,, A,3, A.,,, A,,

. . , A,,}. (1)

One piece of additional indexing is the following

information for the matrix (N + 1) integer array:

{L}={O,l,3,5,7,11,15}. 731

(2)

M. S. Park

132

pit

Matrix

$22

Usual lowing:

multiplication--Y matrix

= AX

multiplication

routines

are the fol-

A=

44 44

i%, Fig. 1. A profiled

symmetric

coefficient

matrix

Corresponding statements:

A.

loops

are similar

to the following

For i=1,2,...,N This array is a pointer array to locate diagonal elements. If the computer program is written in C, then array (1) may be a zero based indexing one and array (2) should contain the following values: {L} = {-I, Furthermore, ing indexing

0, 2,4, 6, 10, 14).

the array {L} is preferred scheme:

arithmatic

is performed

:

of the matrix

A:

A profile of the matrix varies not only with problems but also with node numbering methods. When the problem is specified and the finite element mesh is defined, the profile reduction algorithm may define the improved node numbering based upon the element connection given by the user. Then the array {L} in (4) can be determined before analysis as follows: L_,=

Keeping tracks of the loops (9) the sky-lined array (I) should be positioned in order, as in Fig. 2. If the array (1) resides out of the core memory, a positioning time like Fig. 2 prevails for the large scaled

: a[L[i]]

term

index of the ith row:

(i, j) element

(9)

Y,, + Y,, + A ‘,x,,

as follows:

width of the ith row starting

Forj=1,2,...,N

in the follow-

(4)

ith diagonal

Y&+-O

(3)

L[-1]=-1,L[0]=0,L[1]=2,...,L[5]=14. Then the indexing

Fork=1,2,...,M

B(i)=L[i]-L[i-l] S(i) = i -(L[i]

(9

- L[i - I]) + 1

a[S(i) + j]. coefficient array. So the loops must be modified for the minimum accessing time, that is, by the profile oriented loops. Expanding eqn (8) by using symmetry of the matrix A,

-1

for i = 0, 1,.

Y,,, = A,,, X,, + 4,:Xx

,N - 1

B, = max (i -j

i

+ l),

A, # 0,

L,=L,-,+ll,.

+

(10)

+ A,,,,Y,,,

j < i

(6)

In the program developed, large scaled array {a} is stored in a disk media with two small work arrays residing in the core having the size of Mband.

‘\

\

X

Mband = max (B,).

,

(7)

Fig. 2. Track

pointer

of the array plication (8).

\

>

{a} by the multi-

Matrix computation programs Equation (10) shows two parts of array {u}, the lower part and the upper part, and the following routine is proposed for the proper multiplication named in MMULO.

733

time, multiplication (12) is required. Here the multiplication routine is modified to use the Crout version of the factorized form of A. A=LDLT

and

S=X%DL’X,

(14)

{ w-0 where L is a lower triangular matrix with the diagonal terms of unity and D is a diagonal matrix. Both L and D are stored in a single array (u} of equation (I), which is described in the next section. Such a multiplication is composed of the following two steps:

For i=1,2,...,N {w}cith

row of {u}

Forj=1,2,...,i-1 Y’=X%

and

S=Y%Y.

(15)

YZk+ Y,k+ w,x,, In the program ments:

r,, + y,, + w, x,, Y,r+ Y,k+ M’,x,,

(11)

For the compacted, row-wise storage scheme, the array {w} begins with the first non-zero element in each row of A and ends with the diagonal term. Then the second loop of (11) can be modified to avoid multiplication with zeros. In the program MMULO, only one contiguous sweep of the array {u} completes the multiplication. Another form of frequently used matrix computation (e.g. subspace iteration during eigenvalue extraction) is the following canonical multiplication. S = X’AAX,

developed,

are the following

state-

{Y)+O For i=l,2,...,N {w}tith

row of {u}

Forj=1,2,...,i-1

(12) {S)+O

where the matrix A is the mass matrix or the stiffness matrix and X is the (N x M) rectangular matrix. The matrix S has a dimension of (M x M) and retains a symmetry because of the symmetry of the matrix A. Multiplication (12) can be performed by two consecutive multiplications using routine (11). Nevertheless, a memory saving routine is coded for the out-of-core matrix A as follows (MCANO):

Fori=1,2,...,N S.P-& Multiplication loops (16).

+ Y,, Q K,.

(14) is coded

(16) in KCANO

using the

Linear equation solution In many computational mechanics, it is necessary to solve the following linear system of equations:

IS)+0 Fori=1,2,...,N

Ax=b. {w}tith

(17)

row of {a}

Forj=1,2,...,i-1

S,,+ s,, + x,, W,x,,.

(13)

In routine (13), there need not be a (N x M) temporary array and multiplication (12) is completed by one contiguous sweep of array {u}. Sometimes the matrix A of equation (12) has already been factorized. It is true when the linear equation Ax = b should be solved many times without modifying the contents of A and, at the same

As mentioned before, when the size of the matrix A is very large, so that the core memory cannot contain the coefficients of A, it is necessary to store A out of the core area. Then the eqn (17) is solved by certain elimination processes. Two major categories of the solution procedure are the frontal method and the block sky-line method. Both of them use ‘coreblocks’ during elimination, particularly the block sky-line solver, which uses a compact matrix storage scheme of array (I) and, in general, needs blocks whose size is not small. Here the Crout version of the elimination process [5] (A = LDLT is coded without using such blocks, while using sky-line matrix storage. For a

734

M. S. Park

symmetric coefficient formed as follows:

A, factorization

matrix

For,j=l,2

is per-

,_..,

i-l

b,+b, - w,b,. Fori=1,2,...,N {ll,}cith

(20)

Solution of nugnzented non -.s~wunetric equation In recent years. the continuation or arc-length method is used for overcoming the limit point obstacles in the solution algorithms for the non-linear finite element equations. The general form of the continuation procedure can be expressed by augmenting a constraint equation as follows [6]:

row of {u}

Forj=1,2,...,i-1 {tl}+,jth

row of {u}

Ax=Yb--f

c’x+dy

Forj=1,2,...,i-1

w, + w, / D, D,+-M’, ith row of {a}+ It’.

(18)

Statements (18) are implemented in the program FACTO where the goal is the minimal use of disk access and the amount of the core memory, while the total factorization time (CPU time plus I/O time) remains the same as that of the in-core version solver. To obtain a solution by using the factorized coefficient matrix which is stored in the same array {u i. the substitution procedure follows: Lz = b

(forward

substitution)

Dy = z

(diagonal

scaling)

LTx = y

(backward

substitution).

(19)

The out-of-core version of substitution (19) is implemented in the program SOLO and its statements are as follows:

=g,

(23) where C,,,represents the M th unit vector in the dimension of (N + 1). So the last eqn in (23) is the special form of constraint eqn (22). It is known that if the local parameter index m is chosen such that &Z,,, # 0, the singularity of the augmented equation can be by-passed. Where the vector i is the predictor direction or the tangent vector and its dimension is (N + I), the solver for eqn (23), which is described in [7]. is coded according to the statements (24): If m = N + I, then

(9

L’ =g

(ii) solve Ax = y b - f

row of {ui otherwise

/-I

b,eb, - c w,b, I=/

(i) solve A,,?, =

D,+w,

b

0 o

solve &I%2 =

Fori=l,2,...,N b,+b,/D,

where A,, = -I (ii) rZ= row of {ai

+c,,,

Ae,,, 1

T e,,,

Fori=N.N-I,....1 {Ma}tith

(22)

where the RHS term in equation (21) represents the non-equilibrium force vector between the external applied yb and the internal stress resultant force f. The external load is not usually the process control variable, like in the conventional Newton method, but is determined by solving eqns (21) and (22) simultaneously.‘This situation arises in the predictor step, as well as in the corrector steps. In the program developed, the following augmented non-symmetric system of linear equations are solved, at which the m th degree of freedom is a local control parameter:

For i=1,2,...,N {rr)+ith

(21)

(.??I,, +I (-?I ),, +I - 1

0

9, -9,.

Matrix computation programs Table 1. Problem sizes and computation details by PC486 Neq

102

Aband (Mband) Storage

(2)

(kBytes) Factorization (set) Substitution (W

165 (2:)

240

426

(E)

(I;:,

537

942

(1::)

(I::,

17

36

62

148

210

495

3

6

13

3-i

54

156

I

4

6

10

12

19

where the basic ingredient of the solution procedure (24)is to solve the equation A,.? = i. It is solved by merely modifying some contents of the matrix A maintaining the original structure. The augmented coefficient matrix & has the following form: 0

Zl

42

1

2,

A?

0

A,

%I

B7

amI 7

4,wn %I2

0

The solution

r,FZ

=

100

(25)

r2

Z2 Y

:

of eqn (25) is the following

[i

%

71

[:,\I

942

Neq Aband (Mband)

(1;:)

Storage (MBytes) Factorization (set) Substitution (see) Core version Factorization (set)

1998 4056

6024 10332 20199

(E)

$&

(1:;:)

(Z,

(lE,

0.5

1.6

4.9

9.0

20.7

57.3

3.0

1I

50

117

405

1627

0.1

I

2

15

50

158

0.4

2

11

30

93

*

* ran out of core memory. (iii) y = T,~- aII zI - a,, z,, - a,L z2

~=(z,,z,,z2,Y)T.

Computation

h

procedure.

(i) -,n =h (ii) solve

Table 2. Problem sizes and computation details by HP750

and

r1

I I :I B

135

= (1: XI:]

(26)

examples

All of the programs developed herein are linked with a finite element package FYNS, which was developed for research in my laboratory. Numerical performances of two modules are provided. These are the factorization module FACTO and the substitution module SOLO. Problem sizes and numerical results are in Tables 1 and 2. Machines used are PC 486 DX2-66 and HP750. In the tables, Neq represents

Ftimc(scc)

Stime(sec)

-

Fatorization Time 0

Fig. 3. Computation

Substitution Time

time vs Neq plot.

736

M. S. Park 200

UI

1’ 3

+ s

-

Substitution Time 0

Factorization Time

100

1000

a

2

I

I

4

6

8

0

(*lop

NiZq*AtItNd Fig. 4. Computation

_J

time vs Neq x Aband plot.

200

1 2ooo

Ii

u) z $ s

F

LL

100

1ooc

PkpAtEWd' Fig. 5. Computation

time vs Ney x Aband’

plot.

Matrix

computation

the number of equations, Mband is the maximum semi-bandwidth [eqn (7)] and Aband is the following mean band width:

programs

737

program FACTO and SOLO is approximately four times larger than that of the full core version solver. These programs are becoming able to be efficiently used by linear, non-linear or eigenproblem packages.

Neg. REFERENCES

The value of Aband and the storage spaces needed are relatively small because the compact storage scheme is used after the profile reduction execution. The profile reduction algorithm used by FYNS is contained in [8]. Hundreds of equations are computed by PC 486 and thousands and millions of equations are solved by HP 750. The results of the module FACTP, which is the in-core version of the factorization, are shown in Table 2 for comparison purposes. In the tables, the factorization time and the substitution time are the total elapsed time, that is, CPU time plus disk transfering time. Execution time of FACTO is approximately four times larger than that of FACTP. This linear proportion, regardless of the problem size, is considered as the advantage of the current algorithm using hard disk for the coefficient matrix. Execution times are plotted with respect to the values of Neq, Neq x Aband and Neq x Aband in Figs 3, 4 and 5 respectively. The execution time axis is scaled by logarithm in Fig. 3. From these figures, it is shown that the factorization time is proportional to the value of Neq x Aband2, the substitution time is proportional to the value of Neq x Aband. These proportions are the same as those of the band solver if the value of Aband is replaced by the semi-bandwidth. However, the semi-bandwidth of the examples are the value of Mband and its value is considerably larger. CONCLUSION

The programs that perform efficient multiplications and eliminations of symmetric or augmented non-symmetric large scale sky-line matrices reside out of the core area are developed and coded in C. The core blocks, which are in general not small, are not used by the programs but two small buffers are used. Numerical results show that the order of the elimination time is the same for the developed program and the in-core solver. Total elimination time of the

1. 0. C. Zienkiewicz

2.

3.

4.

5. 6. 7.

8.

and R. L. Taylor, The Finite Element Method, 4th Edn, Vol. I, pp. 479495. McGraw-Hill, New York (1989). B. M. Irons, A frontal solution program for finite element analysis. Int. J. Namer Melh. Engng 2, 5-32 (1970). G. Dhatt, G. Touzot & G. Cantin, The Finite Element Method Displayed, pp. 269-291. John Wiley & Sons, New York (1984). E. Thompson and Y. Shimazak, A frontal procedure using skyline storage. Int. J. Numer Merh. Engng 15, 889-910 (1980). T. J. d. Hughes, The Finite Element Method, PP. 633444. Prentice Hall, New Jersey (1987). g Riks, Progress in collapse analysis. >. ‘Pi-es.;. Vessels Technol. 109, 3341 (1987). W. C. Rheinboldt, Numerical analysis of continuation methods for nonlinear structural problems. Comput. Struct. 13, 103-I 13 (1981). B. U. Koo and B. C. Lee, An efficient profile reduction algorithm based on the frontal ordering scheme and the graph theory. Compur. Struct. 44, 1339-1347 (1992).

APPENDIX-PROGRAM

DESCRIPTIONS

Listing of eight C functions are given below. These functions are facto, solo, factp, mmulo, mcano, kcano, solpad and prepad. All of the routines except solpad and prepad are explained in the context by their own names. Necessary documentation for using these routines is inscribed in the listing. All of the arrays should be properly dimensioned according to the documentation. The file of the coefficient matrix must be opened as follows: FILE

‘fp;

fp = fopen (“tile name”,“w

+ b”);-

Then the correct values should be stored into the file *fp. To solve eqn (23) according to the algorithm (24). the following calls of functions are needed.

prepad(n, m, L, NULL, fp, c, w);

[eon (WI

facfo(n, fp. L, w, v, x);

[con (26)l

for new force vectors

f and b

solpad(n, L, NULL, fp. b, c, f, w, x, m);

ieon (24)i

M. S. Park

738

KAIST

PROS0LVE.C

System& DesignLab.

Page: 1 Printed: Savodr 1993/ 6/26 1993/ 6/26

1)#include atdio.h> L t

g;iz

5 6 7 ;

#&fin *fin #define z;iz

10

REAL double INTEGER short int INTEGER4 Long (LIil-L[(il-11-11 ;:I; (Ii)-LIil+L[(i)-ll+l) A(i,jl aCLCil-(il+(jll MAX(x,Yl ((xv1 ? (Xl : (Yll

REAL fscto(n,fp,L,wi,uj,wkl

: A = L.D.L'

wk = DiaaIAl after factorization Trsce(AlVariables : n .... no of equations fo .... file winter to stiffness away . index array L’[“l . . . . dieg&l l w,Iml .... work array * srrajf wjtml . . . . wrk c m = mx(LIil-LCi-111 i=O,l,... l wkIn1 .... work array +..._____..___...________________________________________.,

l

l

l l

:: f: :; g

<.;WGER

i,j,k,m,iO,j~; d I*pOl 'pi lPJ' I

71

I

m = sizeof(REAL1; rcwind(f 1; frcad(Cw IOl,m.l,fpl; for (i=l; iw; i*+l c fseck(fp,mYLCi-11+11,01; frcad(wi,m,(Ltil-LIi-111,fpl; for (j=iO=S(il,pO=wi,fseck(fp,~(LIj-11-LCill.11; j'i; j++,pO++l c fr~d(uj,n,ILtjl-LIj-111,fDl; for (jO=s(Jl,k=HAX~iO,jOl,pi=~it~-iOI,pj=&ujIk-jOl ; kgj; k++,pi++,pj++l PpDl -= Ppil*Ppjl~ for (j=iO,pi=ui; j'i; j++, pi++1 c if (ukCj1 == 01 continue; (*pOl -= (*pil*(*pil/wkCJl; (*pi1 /= wkCj1; 1 uktil = I'pDl; fscck(fp,nP(LIi-11+11,01; fwrite(wi,m,(LCil-LCi-111,fpl; 1

F

:: 4t I.7

. ti

for (i=O,d=O; ia;

;I

i++l d += wkCi1;

return(d);

I INTEGER solo(n,fp,b,L.uil INTEGER n; INTEGER4 LCI; REAL uiII,bCl; FILE lfp; ,.._.___._.________.______________________________________. l SUBSTITUTIDNS (out-of-coreversion) . Output = SOL(X1 t A.x = L.D.L'.x = b l Return = icn if fail (i = zero diag index1 . =" if sucess * Variables : n . . . . no of equations t fp .... file pointer to stiffness array * LC"1 .... diagonal index array * uiIml .... work array t m = nux(LIil-L[i-111 i=O,l,... * bCn1 .... rhs vector (input1 * solution (output1 ._____...__.._.___..______~_____~~__~~~~__~~~_-~~~~_--~~_~, 1 <.INTEGER i,j,n,iO; REAL d: ,I

1

:

= sizeof(REAL1; rcwind(fp1; bI01 = bt01; for (i=l,fscck(fp,lrl,O?;i
2

for (i=O,rcwind(fpl;ia;

E 8:

l

i++,fscck(fp,~(LIil-LIi-11-11.111

Matrix computation programs

KAIST

PROS0LVE.C

S! teas&DesignLab.

77 ii ii ;i ii ii

C frebd(bd.m,l,fp);

if (d =I 0) return(i); ) btil /= d;

for (i=n-1; i>O; --i) for (j~iO=S(i),f~cek(fp~n*(LCi-ll+l~,O~,frced(wi,m,(Ltil-Lti-ll~,fP~; j
I!!

REAL frctp(n,a,L)

::s E E 108

!Z 111 1:: 114 115 116 117 118 119 120 121 122

Page

INTEGER n; INTEGER4 LCI; REAL (111; ,~........................................................* l FACTDRl2ATION (profile in-Core Version) : A = L.D.L' : Return Trace(A) l Variables : n .... no of equations . a .... stiffness array l L(nl .... diagonal index array . ..........................................................

C ;:W:GER

::t 125 126

i,j,k; d , *pO, *pi, lpjI

for (i=l,pO=ta[ll; ia; i++,pO++) C for (j=S(i); J'i; j*+,pO+*) for (k=mAx(s(i),S(j)),pi=&A(i,t).pj=Wj.k); kgj; k++.pi++.pj++) (*pO) -= (*pi)*(*pj)for (j=S(i),pi=M(i,j),pj=biLtjll; j
i+*) d

l

= atL(ill;

return(d);

::8' 1 129 z INTEGER nmulo(n,m,L,fs,x,y,w) INTEGER n,m; 132 INTEGER4 L(1; :% REAL x(l,y(l,u~l; lfa; 135 FILE 136 137 ,*............................................* l IUILTIPLICATIDN * an a Profile Matrix A (Mass) is * Dutput = Y * V = A.X 141 l Variables : n .... no of equations 111 :E: l .... dimension l Ltnl .... ding-1 index array 144 fa .... A[1 in file 145 t xCn*m].... two dim. array 146 l ytn*ml.... two dim. array (output) 147 l WC1 .... work space of size 148 l max(LCil-L[i-11) 149 * l . . . . . . . . . . . . . . . . . . . . . .......................*. 150 1:: C INTEGER i,j,k,iO; INTEGER4 mr: 1:: *xI,*xj,*yi3*yj,,va* REAi 155 156 mr = sizeof(REAL); for (i=O yi=y; icn- +*i) ::: for (k.6; kw; ++&,**yi) (*yi) = 0; 155 for (i=O,rcwind(fs);ia; ++i) % C fread(w~mr,LCil-Lti-11,fa); for (j=~O=S(i),xi=Lxti*mJ,yi=&yti*ml; j'i; ++j) for (k=O,yj=Py(j*ml,xj~Lx~j*ml,v~=w(j-iO1; ka; ++k) :z C yi(k1 += va*xj[kl. yjrkl += vr'xjtkl; > 164 for (k=O,va=utj-iO1;k&; ++k) 165 yiCk1 l = ve*xi(kl; ) :z return(O); :: 17( ) 171 17Z INTEGER mcano(n,m,L,fa,x,r,w)

:

sakradr 1993/ Printodr 1993/

2 6/26 C/36

M. S. Park

INTEGER ",m; INTEGER4 LII; REAL x[l,rcl,w[l; FILE lfe;

,*____________________________________________* l l

.

l

CANNGNICAL IUJLTIPLICATIOY Q) l Profile Matrix A (Ness) Output = R R = X’.A.X : n

vari9bles

. . . . no of equetions .... dimension m Lb1 .... disgcael index array l fa .... At1 in file * xWd.... two dim. errev t two dim. farrev(output1 l WI1 .... work spece of size l mex(LCil-Lti-111 '_____________________________________-------~,

l l l

rim%i....

C INTEGER i,j,k,l,iO; INTEGER4 mr; lxi,*xj,*rk,ve; REAL mr = sizeof(REAL); for (i=O; i*(mrl)/2; l*il rtil = 0; for (i=O,renind(fal;Ia; +*iI < fr6d(w,m,Ltil-LCi-l~~fe); for (j=1O=S(i),xi=Px[1ml; j
INTEGER kcno(n,m,L.fe.x.r.~,y.d) INTEGER n,m; INTEGERC LII; x~~,tI~,w~l,ytl.dIl; REAL FILE 'fa;

,.____________._______________________________.

'* CANNGNICAL WJLTIPLICATIW t m e Profile Factorized Matrix A (StiffwSS) Output = R R = X'.A.X = X'.L.A.L'.X Variebles : n . . . . noortions .... m Lb1 .... diegcael index arrey fa .... AC1 in file two dim. array XWRI.... rWml.... two dim. sway (output) ytrhl.... work space WC1 .... work spece of size mex(LCil-Lti-11) ._____________________________________-----~, c INTEGER i,j,k,l,iO; INTEGER4 mr; lrk,*yj,*xi,va; REAL mr = sizeofCREAL1; for (k-0; ka*(wll/2; ++kl for (j=O yj=y; ja; +*j) for tk=b: k-n: ++k.*+yj)

rtkl = 0;

for (i=O,renind(fe);i?n; ++if ( fred(w,mr,LCil-Ltl-ll,fel; for (j=rO=S(il,xi=&xCiW; j
Matrix

computation

Page:

KAIST

PROS0LVE.C

8av.d: Printdr

ystem& DesignLnb. )

IYTECER n,n; INTEGER4I. Cl; FILE lf*; REAL lU,btl,ctl,fU,u~l,xCl;

*._______________-~~~---_~_______~~~~~~~~--~~~~~~~~*

*

where

______. inputs n

e is a n-th

m et1 fsC> :I::

cwl+11 fCrnl1 WC1 Qltwts

7::

:

unit

vector,

the

local

parameter

dmension index of

of matrix [al local parameter factorized core stiffness array fectorized file stiffness array profile structure integer i externel reference toed of q-th rcu of A on input) : work arrey(contents : RtlS term : work array for file job solution vector (x2) i modified xl " (X l x2)*0+0/g ______________________________--*,

: profiled : profiled

I’

: ;:W:fiER i,i,k; y.d.R; lfm=stderr; FILE if (m==n) iw; ++i) xCi1 = g*b[il < for (i"O,g=ftnl; if,;f') k = solo(n,fa,x,L,u); k = solp(n, l,x,L); if (k I= n ) fprintf(fr,"\nw Mug-Solver) return((xtnl=g)); 1 for xcll

(i"D,g"D,x[nJ"

xW+l;

iw;

l+i)

xCi1

- ftil;

ZERO

PIVOT

at
" b[il;

" -g;

if (fa) k = solo(n,fa,x,L,u); k = solp(n, a,x,L); else if (k I= n ) fprintf(fn,"\n l * (Aug-Solver) XM " -g;

ZERO

for

+" o*cCil;

fM

(i"D,g"f[n],f[nl"ftnl;

iw;

++i)

fcil

PIVOT

at
" -9;

if (fa) k = solo(n,fa,f,L,w); else k = solp(n, 0,f.L); ftnl " -0; for C

(iLo; i
xtnl

-= ctil*xtil;

if ((d=x[nl-1) I= 0) y = ftnl/d; else fprintf(fm,"\n** (Aw-Solver) for

(i=O;

i<"n;

+*i)

xtil

= y*xtil

ZERO

)

DEWCU\n");

- ftil;

return(x[nl); >

REAL

prepedm,M,L,a,fa,c,w)

INTEGER iWTECER4 FILE REAL

".n: Lil: "fa;

etl,ccl,w~l;

,*.._____.......~~.~~~~~~~~----~~~-----~~~----~~~~~~~* *

Prepere

for SOLPADO

l

* . t

A "

CE d Cl :; ; ;;

"'

CE 0 Cl $ ; ;;

l

nd c " Cd l fl th q-th rou of A (output) .___...__......__...________________________________~,

l

l

741

programs

1993/ 1993/

4 6/26 C/26

M. S.Park

142

r

KAST

PROS0LVE.C

m&Da@Lib,

for

cj=o; j-n;

if t-4

++i1

ctj1

* 0.

c ctnl = 1; return(b); )

if (fM C Y = rirwf(REAL); rwlnd(fr); fr~(fr,lr~(L[r-11*0.0~; froad(u Y LW-Lfa-ll,fr); for (j=b,jb=S(m); ja; ++ ) < ctj1 = utjl: wtjl = d ; ) CW * NW; uw = 1; f~(f~,lr*(Llr-ll+l),o); fwrito(u,lr,LW-Lb-ll,fa); for (I*(: ja: ++I)

< ~6 = siji; _

1: :iz%

_

mr*(Ltj-ll*l+m-jO),o~; fredkijl mr 1 fr)f~~~fr,nr~~Lij~ll*~~-jo~,o~; > fwrit*(Zd.m.l.fa): . . . . ) )

l1w < for Cj=O,jO=SCm); ia; ++j) C ctjl = A(m,j); A(hj) = 0; 1 CW = Mm,m); A(n,m) = 1; for (j-1; j'n; ++j) if (S(j)*=m) C cCj1 = A(j,n); A(j,n) = 0; ) return(O);

Page : Printed: 8av.d:

5 1993/ 6/26 1993/ 6116

Efficient large scaled profile matrix computation programs

Efficient large scaled profile matrix computation programs

Recommend Documents