EVALUATION OF ELEMENT STIFFNESS MATRICES ON CDC STAR-100 COMPUTER AHMEDK. NOOR? and STEPHENJ. HARTLEYS
George Washington University Center at NASA-Langley Research Center, Hampton, VA 23665,U.S.A. (Received 30 June 1977; received for publication 14 September 1977)
Ah&act-An algorithm is presented for the evaluation of the stiffness matrices of higher-order elements on the CLX STAR-100computer. Discussion is focused on the organization of the computation and the mode of storage of the different arrays to take advantage of the STAR pipeline (streaming) capability. An assessment is made of the performance of the proposed algorithm for generating the stiffness matrices of two higher order composite shallow shell and plate elements having 80 and 176degrees of freedom. Also, estimates are given of the CPU times required to evaluate the stiffness coefficients of three-dimensional hexahedral elements using the proposed algorithm. NOME3CL4TURR
WI strain-displacement matrix for the interpolation models
B$ value of the Bij element at the kth quadra(k) ture point using the Ith shape function Bl matrix defined in eqn (9) WI stress-strain (elasticity) matrix det(J) Jacobian of the transformation from local to natural dimensionless coordinates 4 weighting coefficients defined in eqn (3) a,,. J&,, JI,,, JI,, vectors defined in eqn (7) [K”‘] element stiffness matrix [K”] typical block ij of element stiffness matrix In number of nodes in the element N vector length used in the timing formulas N’(i = 1 to m) shape (or interpolation) functions . vectors of shape functions and their derivatives number of numerical quadrature points degree of STAR vector utilization defined in eqn (11) number of degrees of freedom in the finite element model startup time in clocks (40 nsecs each) number of strain components total CPU time in clocks = det[J] * H Cartesian coordinate system Cartesian coordinates of node i vectors defined in eqn (4) dimensionless local coordinate system element domain = (a/ax), (a&), (ala21 = (alaf), (ah), (a/a0 Superscript T denotes transposition, * denotes multiplication.
INTRODUCTION
The advent of vector processing computer systems such as the CDC STAR-100 computer, the Texas Instruments Advanced Scientific Computer and the CRAY-1 computer has brightened the prospects of using higher-order two and three-dimensional finite elements for the analy-
tProfessor of Engineering and Applied Science. $Programmer Analyst.
sis of large complex structures. The limited use of such higher-order elements to date may be attributed to (a) the high computational expense involved in the development of their characteristic matrices (stiffness, geometric stiffness and mass matrices), and (b) the increase in the bandwidth of the resulting system of algebraic equations for the whole structure. The first drawback has been recognized and improvements have been suggested (see, for example, Refs.[l-31); however, the difficulty has not been overcome. The increase in bandwidth of the algebraic equations can become a desirable feature if a vector algorithm is used for the solution (Refs.[4,5]). This raises the question as to whether the use of vector computers in generating the characteristic matrices of higher order two and three-dimensional elements might result in overcoming the first drawback by bringing down the computational times to within the acceptable practical range. The present study addresses this question. More specifically, the objectives of this paper are: (a) to present a computational algorithm for the evaluation of the stiffness matrices of higher-order elements which exploits the vector processing (streaming) capability of the CDC STAR-100 computer; (b) to assess the performance of the proposed algorithm and demonstrate the potential benefits resulting from its use in generating the stiffness matrices of two higher-order composite plate and shallow shell elements and (c) to present estimates of the CPU times required to evaluate the stiffness coefficients of higher-order three-dimensionai hexahedral elements using the proposed algorithm. The computational algorithm described herein is an improvement and an extension of the one presented in Ref.[4]. It is particularly suitable for use.with higherorder elements wherein large number of ‘numerical quadrature points have to be taken for the accurate evaluation of the stiffness coefficients. All the finite elements considered in the present study are isoparametric with curved sides (or faces). The discussion of the computational algorithm is focused on the process of organizing the data and calculations to take advantage of the STAR streaming capability. This provides an insight into the potential of using vector computers for solving complex structural problems and gives an example of the solution strategies required to realize this potential. 151
A.
152
K. NOORand S. J. HARTLEY
SUMMARY OF MAJOR STAR-100 FEATURES
The CDC STAR-100 computer has many new features which distinguish it from current third-generation serial computers. Features which appear to influence most significantly finite element computations are the pipeline (vector) processing capability and virtual memory. These concepts have been discussed in Refs. [41and VI and are summarized in this section.
The startup time represents an inefficiency in the use of vector instructions. Obviously, the timing is best for a particular computation if it is performed with long vectors and few startups, if possible. FINITE RLRMENTFORMULATION
A linear element stiffness matrix (Kce’] can be conveniently expressed in the following form[9]:
Virtual memory
The memory organization on the STAR-100 computer includes a magnetic core storage and random access auxiliary storage (CDC 819 disk storage unit). Memory is divided into 512 word pages where each word is 64 bits (approximately 15 decimal digits). Also allowed are large pages each consisting of I28 small pages or 65,536 words. The central memory of the Langley Research Center STAR computer has 524,288 words (1024 small pages or eight large pages) with a capability to expand to twice that size. The CDC 819 unit is a moving disk system with a capability of 33 million words each. The Langley Research Center STAR computer presently has five drives (a total of 165 million words). The central memory and disk are available to the user through virtual memory concepts. However, to perform the computations, data must be stored in (or transferred to) central memory. The transfer of data to central memory is done in blocks of small or large pages and is considerably slower than computation time; hence, several computations per data item is needed to balance the CPU and IjO times. Moreover, to avoid excessive IjO transfers during any particular phase of computation, data references must be localized in one area of the user’s memory space. Pipeline processing
The central processor of the CDC STAR-100 computer contains a pipeline arithmetic unit which segments arithmetic computation into a sequence of basic operations (such as exponent comparison, coefficient alignment, add and normalize shift). The arithmetic unit can perform basic operations simultaneously on independent pairs of data elements, each pair at a different stage in the computation. A complete arithmetic operation requires a pair of data elements to stream through the entire pipeline (e.g. eight segments). There are two types of data elements on STAR: scalars (a single-valued data element usually contained in one storage word), and vectors (an array of scalars usually contained in consecutive storage words and addressable by a starting address and length). If the same operation is performed on successive data pairs (STAR vectors), results can occur at 20 nsec intervals for some operations (Ref. [7]). These vector operations can lead to substantial gain in speed over sequential scalar arithmetic. The execution time for a vector instruction on the STAR computer is of the form (see Ref.[8] and Appendix A):
T=S+y where T is the total time in clocks (40 nsecs each); S is the startup time which is fixed for a particular vector operation; iV is the vector length and 1 is the number of results that are obtained per clock for the particular operation.
(2) where [B] is the strain-displacement matrix for the interpolation models, [D] is the stress-strain matrix, det [J] is the Jacobian of the transformation from local to natural (dimensionless) coordinates and W) is the element domain. For higher-order elements it is customary to use numerical quadrature for evaluating the integral in eqn (2). This can be computationally expensive, especially if large number of numerical quadrature points are required to obtain accurate results. To simplify the assembly process, the element stiffness matrix is partitioned in both the row and column directions into blocks. Each block corresponds to the degrees of freedom at a node (see Fig. 1). The expression of the stiffness coefficients for a typical block ij is as follows:
[K”] =
2 H,(det [Jl),[Bil,TIDIIB’l,
(3)
,=I
where H, are weighting coefficients; n is the number of numerical quadrature points and [B’] and [B’] are the strain-displacement matrices based on the ith and jth shape functions, respectively. Due to symmetry of the stiffness matrix [K”‘], only the blocks [K”] lying on one side of the main diagonal are formed (Fig. 1). In the present study both two and three-dimensional elements are considered. The elements considered are shown in Fig. 2. The [B] and [II] matrices for these elements are given in Appendices B and C. Symmetric
**. ...
El
Km,m -
Fig. 1. Nodal partitions of element stiffness matrix. VRCTORi+TION OF RLRMRNTSTIFFNESSCOMPUTATION
The performance of a particular algorithm for evaluation of the element characteristic arrays depends on the extent to which the STAR vector processing captibility is utilized in that algorithm. The process of organizing the data and calculation within tXnumerical algorithm so that the operations performed take advantage of the STAR streaming capabilities is usually referred to as uectorization of the algorithm. For a task to be vectorizable it should contain three characteristics: (a) repeated operations; (b) independence of earh’result from the others
Evaluation
of element stiffness matrices on ,CX
STAR400
computer
153
----Y
XJ
X 16-node plate element
ltj-node shallow shell element (al Two-dimensional
P-node quadratic element
elements.
2?-node triquadratic
element
32-node cubic element
(b) Three-dimensional hexahedral elements. Fig. 2. Two and three-dimensional
and (c) the members of each operand packed into contiguous locations in memory. In this section, a compu~tion~ strategy, which is geared towards the STAR pipeline capab~ity, is devised. To fix ideas the procedure is applied to generate the nodal partitions [&?I, eqn (3), of an m-node twodimensional element using Lagrangian shape functions. Numerical quadrature is used with n quadrature points. In the development of the computational algorithm, the following two assumptions were made: (1) The field length required for generating the stiffness matrices was kept smaller than the size of the cenW memory of the STAR computer. This was done to minimize the paging (II0 bperations). Also, if STAR is used in a uniprogranuning mode, the large central memory (524,288 words on the Langley Research Center computer) allows the use of large-size arrays. (2) Checks on zeros were performed in order to suppress the floating point operations on zero vectors and subvectors. For convenience the development of the nodal partitions [K”] is divided into three separate phases, namely: (a) Generation of vectors of shape functions. (6) Generation of the strain-displacement and elasticity matrices; and (c) Evaluation of the nodal stiffness coefficients. The involved in each of these three phases .. operations _ are discussed subsequently.
elements used in the present study.
Generation of vectors of shape functions
In this phase the vectors of shape functions and their derivatives with respect to the spatial coordinates x and y are generated. This is done by first generating the values of the natural c~rdinates 8, t at the n- numerical quadrature points and treating the we~ti~ coefficients corresponding to these points as a vector Ii of length n. Then STAR streaming capability is used for evaluating the m shape functions and their derivatives with respect to the natural coordinates 6 and q. Each of the shape functions Ni (i = 1 to m) and its f, 9 derivatives, $Ni and a,,Ni, is treated as a’vector of length n (see Fii. 3). The next step is to form the following four linear combinations of the vectors @Vi and a,,N’
t4) I where Xi and yc are the cartesian coordinates of node i and the symbol * denotes multiplication. Each of the vectors Z1, Zz, Zs and Z, has a length equal to n. The vectors dtt[J] and w can now be obtained as follows: det[J]=Z,*&-Za*Zd w=det[r]*H
(5) (6)
whkre det IJl - * is the Jacobian determinant vector whose
154
A. K. NOOR and S. J. HARTLEY
i = 1 to m
-1
Fig. 3. Storage
I]
organization
of vectors
elements are the values of det [J] at the different quadrature points; the multiplications in eqns (5) and (6) are STAR vector multiplications (element by element multiplications). The vector w is replicated m times yielding a vector of length nm. This vector is needed in Phase 3 of the computation. The vectors of derivatives of the shape functions with respect to x and y can be obtained by using the following formulas: a,N’ = JI,, * a,N' + JI,* * d,N’ (7)
of shape functions
z vector of length
n
and their derivatives.
The sequence of steps involved in forming the matrices [B], [D] and [BITID] is as follows: The three vectors N, C&Nand +V, each of length nm, are formed by stringing the individual vector N’, QV’ and a,N’ (i = 1 to m) together (see Fig. 3). The three vectors N, d.N and QV are then used in forming the matrix [BIT whose elements B, are vectors of length nm (see Fig. 4). The elasticity matrix [D] is transformed into a matrix of vector elements by broadcasting its coefficients nm times. Then the matrix [a] is formed by taking the product of [BIT and [D], i.e.
a,N’ = JI,, * a,N’+ J12** d,N’
[W = Wm.
where
r J&,-l
(9)
The elements of the matrix [a] are vectors of length nm, and have the same structure as the B, vectors
shown in Fig. 4. A typical element Bij is given by: 8, = 2 Bil * D,j I=,
(10)
The vector (l/det [J]) is obtained by taking the reciprocals of each of the elements of det[J] (one STAR vector division); and the products in eqns (7) and (8) are element by element products.
where the multiplication in eqn (10) is an element by element multiplication.
Generation matrices
Evaluation of the nodal stiffness coeficients
of
the strain-displacement
and elasticity
All vectors considered in Phase 1 had a length equal to n (number of numerical quadrature points). To take advantage of the pipeline capability of the STAR computer, longer vectors should be used. As shown in this section, it is possible to organize the computation in such a way that vectors of length greater than n can be used in Phases 2 and 3. This is achieved by forming simultaneously, the blocks of the element stiffness matrix that occupy each column (e.g. for the jth column, the blocks are [K”‘], [Ki+li]. . . [Km’], see Fig. 2) thereby increasing the average vector length to n(m/2). The increase in the vector length is associated with a considerable increase in the storage requirements. However, since STAR is currently operating in a uniprogramming mode, storage does not appear to cause any difficulty. Should STAR operate in a multiprogramming mode, storage may become a problem.
In the third and final phase of computation, the nodal stiffness coefficients [K”] are evaluated. The independent blocks of the element stiffness matrix that occupy each column are formed simultaneously. The operations involved in the evaluation of the stiffness coefficients of the jth column blocks (i.e. [K’.‘], [Kj+‘.j], . . . [Km.‘]) are as follows: (1) The elements of the strain-displacement matrix [B’] are formed using the vectors N’, &N’ and a,N’ generated in Phase 1. Each of these vectors is of length n. (2) Each of the nonzero coefficients of the matrix [B’] is replicated (m - j + 1) times yielding a vector of length (m-j+l)n. (3) The matrices [KU] (I = j to m) are formed by takine the products of [a] and [B’]. The elements of the matrix [K”] are vectors of length (m - j+ 1)n. As an
155
Evaluation of element stiffness matrices on CDC STAR-100computer
B:l B;l . . . B:l (n) (1) 121
(1
B:* . . . (1 .
I Bil $I (1)
(1) (2)
i
. .. &
+
(n)
(2)
B; BE . . .$
. . . . .
(nl
(1) (2)
(1)
I_’ (I
fn)
I i
B;l*B;I . . . Bfl
(1) 12)
(1) (21
(n)
2 B2 2 . . . Bls ls Bls (1) (21 (n)
. . . . .
$1
m
m
Bls . . - Bls
(11 (21
I
(“I
3
I
1
. . . . .
. . . B2s B;s Bgs . . . Bzs (nl
(1) (2) I”) - 3
i
(1) (2)
B,ml‘3; . . . B,ml
. . . . .
(n)
: : .
i
Bis &
(“) 3
Bfl Bfl . . . Btl I
1 1 1 Bls Bls * * * Bls (1) (2) (n)
(1) (2)
1 I I \
: .
3
B2”1B; . . . B2”1
. .. . .
. . . B;1
(2)
‘“I
\
(1) (2)
‘3; B; . . .B; (l!
In)
(2) (“)
3
: . B1
B1
ri,’
t;;
Bi; = Value of the IK)
using the
. . . B1 4
Bfs BFs . , . BFs (1) (2)
Bii element at the
0th
K th
. . . . .
quadrature
B; B; . . . B; (1)
In)
(2)
(n)
point computed
shape function.
Fig. 4. Storage organization of the strain-displacement matrix [B].
the evaluation of the coefficients [K&l is shown in Fig. 5. (4) Each of the elements of the [K”] matrix is multiplied by the subvector of w of length (m - j+ 1)n. The stiffness coefficients Kz are then obtained by summing the n coefficients Ki, Kz,. . . Kz which occupy con(1) (2) (n) tiguous locations in memory. This summation is done by a single hardware instruction. The storage organization of the coefficients of the submatrices [K”] is shown in Fig. 6. example,
txmmf.wE EVAL~JATION 0F momsm momm To assess the computational efficiency of the proposed approach, a computer program was developed based on this approach to generate the element stiffness matrices of two ldnode isoparametric composite elements. STAR
FORTRAN was used as the programming language. This program was integrated with an assembly and solution modules as well as with pre- and post-processors to form a Finite Element System for STAR (FESS), which is operational on the Langley Research Center STAR computer. The discussion herein is limited to the program modules of FESS used in generating the stiffness matrices of the two elements. The two elements considered are: (a) &ear-flexible composite shallow shell element with five degrees of freedom at each node (a total of 80 degrees of freedom); and (b) a composite plate element based on a higher-order two-dimensional plate theory which accounts for transverse shear deformation, transverse normal stress and transverse normal strain. The element had 11 degrees of freedom at each node (a total of 176 degrees of freedom). Both elements use Lagrangian interpolation functions and the number
A. K. NOORand S. J. HARTLEY
+ 1 1 1 2 2 2 n12 a12 ... a12 n12 a12 ... a12 (11 m In) Ill (21 (Ill
.
i
i
j
j+l j+l
j+l
. n12 a12 ... n12 12 a2 ... a12 111 IZI Bi ,i 21
a
Ill) 111 121
1111121
nil Bhnil... Bj
.,.
In) (11 ra
hll
Bi ,i
21
11) la
m
..-.lrnrn . . 612 s,* ... 312
bll Y .
(nl
.
.
,..
n 21 Ill I21
...
Bh IN
.: 1 2 2 2 I 1 e1r PlT ... dr o1r o1r ... 011 Ill PI (n) Ill 121 Iill
.
i
j
jtl j+l
j+l
InI Ill 12)
IllI
j
. . r1r 01r ... a1r *1r olr ... 9]r 111 m
mm
m
. . . . *1r a1r ... 911 ill (21 InI
* Bjrl
...
...
. . . .
.
Fig. 5. Evaluation of K& (I = j to m).
.
21 21 51 52
.
” . K1.r
Kzs =
(r.s)
element
in the li,il
partition
of
the elementstiffness
matrix
Fig. 6. Storage organization of the element stiffness matrix [IC?
quadrature points used in evaluating the integrals was 25. The fundamental equations of the two-dimensional theories used in developing these elements are summarized in Appendix B. A flow chart of the program modules used in generating the element stiffness matrices is shown in Fig. 7. In the process of implementing the proposed approach, a number of programming techniques were investigated and the experience gained in the three areas of matrix multiplication, replication and reduction of overhead is discussed subsequently.
(1) Matrix multiplication algorithm
Two matrix multiplications occur in the process of evaluation of the element stiffness matrices (boxes 5 and 9 in Fig. 7). The entries of those matrices are vectors. The matrix product was obtained by forming linear combinations of rows (or columns) (see Ref.[61). In addition, zero checks were made on the first element of each of the vector operands in the next to the innermost loop. This resulted in a 25 and 58% reduction in CPU time from that required by standard matrix multiplication scheme (with no zero checks) for the elements with 80
Evaluation
v
START
@
@r&-z&l
0
157
of element stiffness matrices on CDC STAR-100 computer
Multiply each vector
Evaluate shape functions and their derivatives
.
.
Phi e III summing il
broadcast the elements to obtain matrix with
@
@
sections
Phase
Loop over columns j=l
: Form [~j] and replicate it jl times
I
Fig. 7. Flow chart
for programmodules used in generating element stiffness matrices.
and 176 degrees of freedom, respectively. Addition of another zero check in the innermost loop reduced the CPU time by only 1%. (2) Repl~~afion scheme The proposed algorithm tails for replication of vectors (i.e. forming several contiguous copies of the same vector-boxes 2 and 8 in Fig. 7). This was done by using a single hardware instruction (transmit indexed list,[‘lJ), which was found to be more efficient than the use of vector transfers inside of a loop. For the 176 degrees of freedom element it resulted in a 3% reduction in CPU time. Addition of a zero check of the first element of each vector to be replicated (to avoid replication of zero vectors) reduced the CPU time for the 176 degrees of freedom element by 25%.
(3) Reductionin the overheadassociatedwithevaluating subscriptsand buildingvectordescriptors The critical performance areas of the program, where most of the CPU time is expended, were found to be those associated with blocks 9, 10 and 13 in Fig. 7. A reduction of 10% in the CPU time was achieved by reducing the overhead associated with evaluating subscripts and building vector descriptors. This was accomplished by: (a) Explicitly forming the invariant descriptors in the nested DO Loops in the critical performance areas and moving them out of the inner DO Loops; (b) Evaluating the subscripts explicitly using onedimensional arrays with invariant portions moved out of inner loops and replacing the multiplications by additions (see Ref. [IO]).
158
A. K.
NOORand
If an optimizing compiler had been available, it presumably would have performed the overhead reduction tasks and the analyst would not have had to be concerned about them. Figure 8 shows the effect of including the aforementioned programming details and of increasing the vector length on the CPU times required for generation of the stiffness matrices for the 80 and 176 degrees of freedom elements. The upper part of the figure shows some alternate schemes and the lower part shows the corresponding CPU times. Schemes I and II in Fig. 8 are based on forming each of the individual blocks of the element stiffness matrix [I?‘] separately. The vector length in this case is only 25. All other schemes are based on the algorithm outlined in the previous section wherein
Scheme I
Average
! I vector length
/
25
I 1 25
III
~ 200 I
+--_ IV
1200
!I
I /
SIMULATORGENERATED NUhiERICAL, RESULTS
To estimate the CPU times required on the STAR for
the generation of different two- and three-dimensional elements using the proposed algorithm, a simulator was
Matrix multiplication algorithm
Replication scheme
programming
Standard with zero checks Vector transfers
linear combination of rows with zero checks
None . Vector transfers with zero checks
I
200 ~ 2~
HARTLEY
the blocks occupying each column of the element stiffness matrix are formed simultaneously. The average vector length is thereby increased to 200. Table 1 summarizes the CPU times and field length required for generating the stiffness matrices on both the STAR-100 and CYBER-175 computers. As can be seen from this table, the proposed algorithm leads to an order of magnitude reduction in the element-form time compared with that required on the CYBER-175 computer.
1
II
VI
S. J.
_ Linear combination i;lows w;) zero checks in innermost
-.
Transmit indexed ;“_‘ instruction)
1 list
1
Reducing overhead in evaluatirlg subscripts and building vector descriptor
1.0 CPU time .W in seconds
CPU time in seconds .5
.25
(a) 80 degrees of freedom element. Fig. 8.
Ibl 176 degrees of freedom element
Effectof various programming details on the CPU time required for generating the element stiffness matrix.
Table 1. CPU times and held lengths on STAR-100computer(usingSTAR FORTRAN2.0 CYCLE FTN 115PZ Compiler under STAR OS 1.1.5operating system) for the two sample elements , Element It Total numberbf degrees of freedom CPU time (in msec) on STAR (Decimal)tQ
STAR-100
Vector length = 25 . Av. vector length = 200
CYBER 175 . Vector length = 25 Av. vector length,=200
‘. 80 727 I44 , 1330 f 12,000 53,400
Element II$ 176 552 282,100
tShear-flexible composite shallow shell &merit. #Composite plate element with ffansverse s&+ar deformation, transverse normal stress and transverse normal strain included. BIncludingstorage of stiffness matrix for one element.
Evaluationof element stiffnessmatriceson CDCSTAR-100computer built by removing the actual floating-point operations from the element modules in FESS and replacing them by counters and timing estimates for vector operations. The reliability of the simulator in predicting the computational requirements was tested by comparing the estimated and actual (measured) CPU times for the two 16-node elements. The scalar operations involved in the algorithm were not included in the simulator since they are restricted to a small portion of Phase I and the CPU time associated with that part was found to be less than 2% of the total CPU time. The input to the simulator consists of: (i) number of nodes in the element; (ii) number of numerical quadrature points; and (iii) sparsity patterns and dimensions of the matrices [B] and [D]. The output consists of the number of arithmetic operations in each of the three phases as well as the CPU time involved in the vector operations of the proposed algorithm. In addition, an attempt was made to assess the efficiency of the vector operations in the program. As a measure of the efficiency of vector operations, the degree of STAR vector utilization was used. This is a weighted average of the full streaming capacity of the STAR and is defined by the following formula:
159
When the STAR FORTRAN compiler is optimized, it is expected that the overhead will reduce significantly. An evidence to that is the 10% reduction in CPU time resulting from improving the evaluation of subscripts and building vector descriptors in the critical performance areas of the program. An examination of the results presented in Table 2 reveals that for three-dimensional hexahedral elements with n 532, the CPU time required for generating the element stiffness matrices is less than half a second. As the number of nodes and/or numerical quadrature points increases, the average vector length increases thereby increasing the STAR vector utilization ratio.
POSSIBLE FURTHERIhlPROVRMENTSAND EXTENSIONS On the basis of the experience gained with the FESS program, further improvements can be made to reduce the CPU time and the storage requirements for evaluation of element stiffness matrices. Possible areas of improvements which are currently under investigation by the authors include: (1) Storing only the nonzero elements of the [B] matrix. This can be accomplished by using a pointer matrix which indicates the locations of the nonzero elements. In addition, the elements of the [D] matrix can be used as broadcast constants in the product [B]=[D], eliminating the need for explicitly broadcasting each (11) element. This can result in a considerable saving in the storage requirements at the expense of some increase in the CPU time. where the summation extends over all vector operations (2) Forming all the blocks of the element stiffness in the program and the different symbols are defined in eqn matrix simultaneously. This can be achieved by stringing 1. the [B’] matrices (j = 1 to m.) together prior to the Table 2 summarizes the results obtained by the simu- premultiplication of the [B] matrix. Such a scheme lator for a number of three-dimensional hexahedral ele- results in increasing the average vector length from the ments. Also shown in the table are the estimated and present n (m/2) to n m[(m + 1)/2] (i.e. from 200 to 3400 actual (measured) CPU times for the two-dimensional for the lbnode elements with 25 quadrature points). The elements. The difference between the actual and esti- associated reduction in CPU time is expected to be quite mated CPU times is shown to be 42 and 34% for the substantial. elements with 80 and 176 degrees of freedom, respec(3) Performing the two matrix multiplications involved tively, and can be attributed to the overhead associated in ([BITID])[B] simultaneously, thereby removing the with: need to reserve storage for the intermediate product (a) calculation of the subscripts of arrays; [Se]= [BITID]. In addition, the subvectors of each row (b) initialization, incrementing and testing of DO Loop of [fl, of length n, can be processed (weighted, summed indices; and the result stored in the appropriate location) as they (c) subroutine parameters and calls; and are formed, thereby reducing the storage requirements (d) building of vector descriptors. for the product. Table 2. EstimatedCPU times for vector operations involved in the generation of element stiffness matrices
Element Two-dimensional shallow shell Twodimensional composite mate
Three-dimensional hexahedral elements
Numberof
Material
Number of nodes
degrees of freedom
quadrature points
Anisotronic
16
80
25
Anisotropic
16
176
25
365 (552)*
0.519
Orthotropic Anisotropic Orthotropic Anisotropic Orthotropic Anisotropic Orthotropic Anosotropic
20 20 27 27 32 32 32 32
60 60 81 81 % 96 % %
27 27 27 27 27 27 64 64
51 56 86 93 116 126 200 219
0.444 0.446 0.460 0.462 0.468 0.469 0.639 0.640
Numbers in parenthesis CAS Vol. 9, No. 2-D
Estimated CPU time for vector operations (msec)
Numberof
84 tt‘W*
refer to measured times on Langley Research Center STAR computer.
STAR vector utilization ratio
0.496
160
A. K. NOORand S. J. HARTLEY
The aforementioned discussion demonstrates the potential of using the proposed algorithm for generating the stiffness matrices of higher-order elements on the. STAR computer. It is expected that the reduction in CPU time and storage requiremenrs resulting from the use of the foregoing possible improvements, coupled with other advantages of higher-order elements, will make it practical to carry out large scale computations with these elements in such complex areas as nonlinear and dynamic finite element analysis. CONCLUDING REMARKS
An algorithm is presented for the evaluation of the stiffness matrices of higher-order elements on the CJX STAR-100 computer. Discussion is focused on the process of organizing the data and calculations to take advantage of the STAR streaming capability. An assessment is made of the performance of the proposed algorithm for generating the stiffness matrices of two higher-order composite shallow shell and plate elements having 80 and 176 degrees of freedom. Also, a simulator is developed to estimate the CPU times required to evaluate the stiffness coefficients of three-dimensional hexahedral elements. The results of the present study suggest the following conclusions: (1) The use of the proposed algorithm on the STAR100 computer can significantly reduce the computation time of the stiffness-matrices of higher-order elements. For example, for the 80 degrees of freedom element an order of magnitude reduction in the CPU time was achieved over that required on CDC CYBER-17s computer. As the number of nodes and/or numerical quadrature points increases, the average vector length increases, thereby increasing the STAR vector utilization and speed-up ratios. (2) Programming details (e.g. matrix multiplication and replication schemes) are important for the efficient use of the STAR computer. (3) Estimating the CPU time required and assessing the efficiency of different algorithms on the STAR computer can be economically carried out through simulation of the software operations, a technique well used in computer science but not widely applied to engineering software. Such an approach provides key information on critical performance areas without undue commitment of computer resources. These first results demonstrate the high potential of the STAR computer for structural problems. The in-
creased speed achieved by using the proposed algorithm opens the door for carrying out large-scale computations using higher-order elements in such complex areas as nonlinear and dynamic finite element analysis.
5. J. Lambiotte, The solution of linear systems of equations on a vector computer. Ph.D. Dissertation, University of Virginia, Charlottesville (1976). 6. A. K. Noor and S. J. Voigt, Hypermatrix scheme for finite element system on CDC STAR-100 cornouter. Comout. Struct. 5, i87-2% (1975). 7. Hardware Reference Manual for Control Data STAR-100 Computer, Publication No. 60256oo0,Revision 09 (1975). 8. CDC STAR-100 Instruction Execution Times, Control Data Corporation, (Preliminary) Publication No. 60440600,Revision 01 Corrected and Clarified (1976). 9. 0. C. Zienkiewicz, The Finite Element Method in Engineering Science. McGraw-Hill, London (1971). 10. FORTRAN Language Version 2.0 Reference Manual for CDC STAR Computer System, Control Data Corporation. Publication No. 60384500,Revision C, (1976). I I. A. K. Noor and M. D. Mathers. Shear-flexible finite element models of laminated composite’plates and shells. NASA TN D-8044(1975). 12. K. H. Lo, R. M. Christensen and E. M. Wu, A high order theory for uniform and laminated plates, NASA CP-2001, Vol. 1, pp. 157-166, In Advances in Engineering Sciences. (1976). APPENDIX A
Timing estimates for vector instructions
The timing estimates for the vector instructions used in the simulator are based on the following assumptions. (a) No overlap of computations is allowed on STAR; (b) Memory and register conflicts neglected. The vector timing estimates are given in the following table:
Execution time in STAR clocks (40 nano-seconds) for vectorst of length N [8]
Operation Add/subtract Multiply Divide Transfer Replication (p times) Summation of N components of a vector
71+4 [N/81 159+ 8 [N/81 167+ 16 [N/8] 91+4 [N/81 72 + /.L(56+ N/2) 121+6N
t64 bits per vector element APPENDIX B
Fundamental equations of shallow shell and plate theories used in present study
The strain-displacement and elasticity matrices for the composite shallow shell and plate elements used in the present study are: Shallow shell elements [Ref. 1I]. The element has five degrees of freedom at each node. The [B] and [D] matrices are given by: Strain-displacement
matrix
'a, .
REFERENCES
B. M. Irons, Economical computer techniques for numerically integrated finite elements. Int. J. Numer. Meth. Engng l(2), 201-203(1%9). 2. A. K. Gupta and B. Mohraz, A method of computing numerically integrated stiffness matrices. ht. .I. Numer. Meth. Engng 5(l), 83-89 (1972). 3. C. M. Andersen and A. K. Noor, A computerized symbolic integration technique for development of triangular and quadrilateral composite shallow-shell finite elements. NASA TN D-8067 (1975). 4. A. K. Noor and R. E. Fulton, Impact of CDC STAR-100 computer m finite element systems, .I. Struct. Div., ASCE 101,731-750(1975).
k,;.
.
ay k,I.
1.
ay a, 2k,, I
[B’] =
-----__ .
I___. la, 1. a
Y
Iay a, ‘------r---
a,11
.
ay i.
i
where k,, k2 are the curvatures in the x and y directions and k12 is the twist of the surface.
161
Evaluation of element stiffness matrices on CDC STAR-100computer Elasticity
Elasticity
matrix
matrix
-
D,,
42 42
ID1= Symmetric
WI = L
Symme~ic The underlined terms are zero for isotropic and orthotropic materials. &8 1
The underlined terms are zero for isotropic and orthotropic shells. Composite plate elementsIl21. The element has eleven degrees of freedom at each node, its [I?] and [D] matrices are given by: Strain-displacement
axa* 1
.
matrix
T
P
.
*
:;.
.. ,I. .I’
*.
.
.
,I.
:I*
.
.(.
Evaluation of vectors of derivatives respect to Cartesian coordinates
of shape functions
The procedure for evaluating the vectors &N’, a,N’ and a,N’ is h simular to that outlined for two-dimensional elements (eqns. 4-g). The corresponding equations for three-dimensional hexahedral elements are given below. a$ = J&, * a,N; + JI,, * a$ + JI,, * a,N’ ,?,.N;= JQ, * a,N'+ JI,, * a,N'+ JIt3 * a$ a,N’ = JI,, * a,N’ t JIs2* a,N'+ JI,, * a,N
* .
.
(i = 1 to m)
_ai__4__li,_L___:_____1:___L . . “I” (+ [B’] =
.
.
.I.
.
.
.I.
x a,.
.I.
’
a, . r~_-_-+--L_ I . . . . i . . 1 .’ ay . I . . .,. . . I ay a, + I s -~+_-__--~-__--_ _
.
.
____ . .
.
.I*
> .
.
.
.,*
’
N’
.
.
.
.
.;..
.
, s ___ *
.
y-
.
$9 dY
.a
* .I* __-_~_l_~_,_*_r~_Tr
. . axt T
--___--_+
where 6, r), [ are dimensionless local coordinates
.
I . . ff.
. . . t ay
______T~_----__ . .
.
.f.
. ; * _L__~__!YL‘ . . .,. -----t. aXf7 _i_. _l./_‘__L_ .I. . I :__L_~_LI--~_-5 * .
' .
.I' .I.
2 ‘. 1 L- T__+-_7_
. r. .I.
' .
f
ay,.
3.
The elasticity matrix [D] is a 20 x 20 symmetric matrix having 9.r = 0._for ,_ any pair i, j with i < 15 and j > 4.
a*Nt *xi’
a,N:* Y, a,N'* Z.
APPENDIXC
a N'*i. a;N'* y;
Fundamental equations of three-dimensional elasticity used in development of hexahedral elements The hexahedral elements considered in the present study had
d N’*z 8-N’ * d:N; * yi ,a,N‘ * z,_
x:
three degrees of freedom at each of their nodes (three displacement components). The [B] and [D] matrices for these elements are given by: Strain-displacement
matrix a,
. .
.
a
.
.’ a [B']= y--a,_2;_ i
anddet 14 = (Z,, * Z,, - Z,,*z,,) * z,,
1 N'
with
+ cz,3
* z2,
-xl,
* z,,)
* z32
+(~,2*z*3-2,3*Z22)*Z3,
Each of the vectors in the above equations is of length n.