A massively parallel implementation of the spectral element method for impact problems in plate structures

A massively parallel implementation of the spectral element method for impact problems in plate structures

Computing Systems in Engineering, Vol. 5, No. 4-6, pp. 375 388, 1994 Pergamon 0956-0521(94)00013-1 Copyright i('; 1995 Elsevier Science Ltd Printed...

5MB Sizes 27 Downloads 139 Views

Computing Systems in Engineering, Vol. 5, No. 4-6, pp. 375 388, 1994

Pergamon

0956-0521(94)00013-1

Copyright i('; 1995 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0956-0521/94 $7.00 + 0.00

A M A S S I V E L Y P A R A L L E L I M P L E M E N T A T I O N OF THE SPECTRAL ELEMENT METHOD FOR IMPACT P R O B L E M S IN P L A T E S T R U C T U R E S ALBERT N. DANIALf a n d JAMES F. DOYLE~ tThe MacNeal-Schwendler Corp., 815 Colorado Blvd., Los Angeles, CA 90041, U.S.A. ~School of Aeronautics and Astronautics, Purdue University, West Lafayette, IN 4?907, U.S.A. Abstract--The spectral element method is a frequency domain formulated matrix method for the dynamic analysis of structures. We show that an important attribute of the spectral formulation is its suitability for implementation on large multiprocessor computers. In this paper we summarize the development of the spectral element method for dynamic plate problems and outline the implementation of the algorithm on MasPar MP-1 and MP-2 computers with 16 384 processors each. The implementation exhibits near perfect scalability. A direct consequence of the ensuing high computational performance is the ability to solve inverse dynamic problems. Example solutions using experimentally measured acceleration response to find the impact site in plate structures are demonstrated.

INTRODUCTION Stiffened a n d folded plates form aircraft fuselages, ship hulls, buildings, bridges a n d vehicle chassis, a m o n g o t h e r structures. These structures endure dyn a m i c loads from m a n y sources. The ability to predict the structural response o f folded plates to d y n a m i c loads is therefore significant. 13 There are several ways to a p p r o a c h this problem. Some of the m o r e p o p u l a r m e t h o d s include finite element discretization with direct time integration a n d finite element discretization with m o d e superposition; Bathe 4 gives c o m p r e h e n s i v e descriptions a n d references for b o t h of these methods. Solution m e t h o d s for transient d y n a m i c plate p r o b l e m s involving realistic structures are c o m p u t a tionally intensive because of the small element sizes, small time-steps a n d / o r n u m e r o u s n o r m a l modes which are required. This c o m p u t a t i o n a l effort becomes c o m p o u n d e d when performing inverse solu t i o n s - s u c h as when d y n a m i c response at a few locations is k n o w n a n d one desires i n f o r m a t i o n a b o u t the load which causes the response (where it acts, what the force history is) or a b o u t the structure itself (whether or not there are cracks, loose fasteners, or d e l a m i n a t e d areas). These d y n a m i c inverse p r o b l e m s have applications in d a m a g e assessment, non-destructive testing a n d remote sensing. T h e i r solution, however, typically requires multiple solutions to the forward problem. In this work we overcome the high c o m p u t a t i o n a l costs associated with d y n a m i c inverse p r o b l e m s by reformulating plate d y n a m i c response in a highly parallel m a n n e r , then implementing the solution m e t h o d on M a s P a r MP-1 a n d M P - 2 massively parallel computers. The solution m e t h o d attains near-linear speed-up on 16384 processors and allows

t h o u s a n d s of forward p r o b l e m s to be solved in a reasonable a m o u n t o f time. As a result of this c o m p u t a t i o n a l performance, we are able to solve the inverse problem of force location on a multiply-connected plate structure. The central idea b e h i n d our solution m e t h o d is that structural d y n a m i c b e h a v i o r is i n d e p e n d e n t at different frequencies. By expressing structural response in the frequency d o m a i n r a t h e r t h a n the time d o m a i n , we are able to c o m p u t e c o n t r i b u t i o n s at the different frequencies simultaneously. In this work we describe the derivation of frequency domain, or spectral elements; the i m p l e m e n t a t i o n of o u r spectral element algorithm on the MP-1 a n d MP-2; and show how i n f o r m a t i o n a b o u t the structure or the applied load may be extracted from limited response measurements. P L A T E GOVERNING EQUATION

A folded plate structure experiences b o t h out-ofplane and in-plane forces. By way o f illustration we d e m o n s t r a t e the f o r m u l a t i o n of the former. Details of the latter are given by Rizzi a n d Doyle.' We assume classical plate theory 6 for thin isotropic plates a n d take the e q u a t i o n of m o t i o n for a differential plate element with d a m p i n g a n d external loads as OV2Ven ' + ch ~2~" + ph ~,t~ = q(x, .t', t)

(1)

where

Eh 3 12(1 - v:)

,

?z

8:

V2 = 7 ~ + 7 , (,,v~ ?v k

and w ( x , y , t) is the transverse displacement of the plate, D the flexural stiffness, p the mass density, h the 375

376

ALBERT N. DANIALand JAMESF. DOYLE

z, w(x,y,t)

(a)

Z, w(x,y,t)

V

(b) Fig. 1. Coordinate system and sign convention for spectral plate elements. uniform thickness, c the constant damping coefficient per unit volume, and q(x, y, t) the distributed transverse loading. The constants E and v are Young's modulus and Poisson's ratio, respectively. The coordinate system is illustrated in Fig. la. Boundary conditions 7 associated with this fourthorder differential equation along an edge x = constant are specified in terms of the following: Displacement: Slope:

~14 ~

(3)

q~, = - -

8x

Moment:

M, =

+

(4)

v cy3

[-83w Shear:

(2)

w = w(x, y, t)

be large enough so that the response at the point of interest has diminished to an insignificant value at time T. The latter condition arises from our use of the F F T ; it prevents the response at time T from wrapping around and affecting the response at time zero. Values for W and M are chosen similarly: the wavenumbers q~ must be small enough to sufficiently model the spatial distribution of the applied load in the y direction, and W must be large enough to prevent waves from neighboring space windows from propagating into the window we are monitoring. The values of T, W, M and N we use for the solutions presented in the following sections appear in Table 1. When Eq. (6) is substituted into the equation of motion (Eq. 1) with q = 0, we get

k~, = - q 2m + _ ~/ p/ h c o ] ~- ich~, = -r/2 +fl].

We identify two families (or modes) of behavior given by

k,,,,= + ~

f12, k2m~= + _ i ~ m + fl].

].

(5)

Plots of k I and k2 as functions of o9 and t/ are very similar to the analogous modes for the in-plane case. s Note that in the absence of damping (c = 0), kl can have both real and imaginary parts while k 2 will be entirely imaginary. When damping is present, both kl and k2 will have real and imaginary parts. The general spectral solution for the dynamic response of the plate can now be written as

w ( x , y , t ) = ~ , ~ , [ A m , e ~ ..... +Bm, e ,g..... n

The appropriate shear to be specified as a boundary loading is the Kirchhoff shear. The spectral analysis assumes a general solution of the form

~

~mne

m2rc t/m-- W '

~o,-

(7)

83w ]

r,=--DLsm3+(2--v)8~/~2

w(x,y,t)= ~

(7)

ik.....

e i""Yei~'o',

n=lm=0

n27t T

(6)

where k,,, is determined from the governing differential equation. This discretized form is an approximation to the double continuous Fourier transform; 8 under the proper circumstances, however, this approximation can be highly accurate. Typically, we use values of N ranging from 1024 to 4096 and M from 256 to 1024, depending on the proper choice of space window W and time window T. Values for T and N are chosen together to satisfy two constraints: the Nyquist frequency (given by N/2T) must exceed the most significant frequency components in the input history, and the value of T must

m

+ Cm" e+iklm,lx L) + Dmn e+ik2m,cx L)] xe-i"mY¢~"'

(8)

where L is introduced to associate the reflections coming from a boundary at x = L, and wp(x, y, t) is the particular solution associated with the distributed loading q(x,y, t). Here we let q(x, y, t) = 0 and consider the loads to be applied only along concentrated strips in the y direction. The utility of this approach is demonstrated below where the strips form the interface of plate elements. As a special case, the approach also allows point impact to be applied to the plates.

Table 1. Computational parameters number of y space terms y space window length number of y space terms time window number of time terms

W= W= M= T= N=

00.00 m 10.16m 512 40 960 ps 2048

377

Massively parallel implementation of the spectral element method SPECTRAL PLATE ELEMENTS FOR FLEXURAL RESPONSE

The complete derivation of spectral plate elements, along with solution comparisons with finite elements, is given in Ref. 9. Here we summarize the development of the two-edge element shown in Fig. lb to illustrate the basic steps involved in the derivation and to explain unique features of the method.

Two-edge flexural element Equation (8), with the proper choice of coefficients A, B, C and D, can be used to model a plate with parallel edges separated a distance x = L. (To simplify notation in the following derivation we momentarily omit subscripts m and n on the coefficients A . . . D and related terms. The inclusion of m and n is resumed with Eq. 10.) The coefficients are rewritten in terms of displacements and loads, leading to a dynamic stiffness relation. The process begins by expanding moment and shear expressions (Eqs 4 and 5) by replacing w(x, y, t) with the spectral solution for w (Eq. 8). This gives

This second matrix equation will be abbreviated as {ti} = [J]{a}. We now eliminate the coefficients which leads to [/Tt]{ti} = {P} where [/Tq=[Hl[J] '. The complex [4 x 4] matrix [/Tq is the dynamic plate element stiffness matrix which relates applied forces and moments along the edges to transverse displacement and rotation there. The terms of [/~t] may be found symbolically as in Ref. 9, or they may be computed numerically. The superscript f signifies a flexural dynamic stiffness matrix to contrast it from the in-plane or hmgitudinal dynamic stiffness matrix [/71]. The dynamic stiffness matrices [k/] and [/7~, each of which have throw-off and two-edged representations, are symmetric.

Combination with in-plane plate element Plates of unequal thickness, stiffness, and viscous damping may now be joined along common edges by merely summing together the dynamic stiffness matrices for different elements into a global dynamic stiffness matrix [K]. To join the in-plane and out-ofplane elements, we introduce nodal load and degree of freedom vectors with respect to local coordinates as

12 =D[_iklkveAe

ik,,_ik2kvlBe ik~ ,F~'~'- {~',~. F<, . . Vq,M,, . . . . F,,,F,.~, . . V_, ~I,e }'

+iklk,.zCe ik~,L xl + ikek~qD e ik21L-xl] ~I,. = D[k,. 1A e ik~.~+ k,eB e -ik2.,+ k~1C e -ik'lL" ~) +kv2De ik,
i22

=

iklk~2 -k,q _iklk~,2e ikiL k~le ik,L

&,, 4,

{u} - { a S ,

w,,

T

where the tilde-bar combination indicate terms in a coordinate system local to the element. We then have the combined element stiffness relation

,tEl.

ik2k~q -iklk~, 2 -ik2k~ 1 ] f i t -k~2 -k. -k,, 2 ] _ik2k~j e ~k2L iklkv2 e ~k,t~ ik2k~l e ik~tI k~2e ,k2t k,qe ,k~t k~,2e ik~L ]

where subscripts 1 and 2 refer to nodal values at the edges x = 0 and x = L, respectively, and

k~,l=- -fl2 + ( 1 - V)tl 2, k~2= fl2 +(l - v)tl 2. The above matrix relationship will be abbreviated as {/~} = [H]{a}. Next, we write the displacements at the edges in terms of the four spectral coefficients

o3~

~2

1

[--ikle ik,t

l

-ik2e ,k2t ikle ikl~. ik~e ,~2L

(9)

378

ALBERTN. DANIALand JAMESF. DOYLE

z, w(x,y,O

obtained for first solving for {tT} then forming the inverse F F T of fi where M

GP,n = ~ aq,mnEmCOS(n~y ) fi = / 5 ~ p

Fig. 2. Cross-sectional view of a thin-walled box beam. where [if] is the [8 x 8] matrix containing both [k7] and [~t] for the two-edge element, and a [4 x 4] matrix containing the analogous terms for the throw-off element. The resulting general plate element models both in-plane and out-of-plane waves, and therefore can be joined with other such elements at arbitrary angles 0 about the y axis. Rotation is achieved via the rotation matrix

[R]=

COS0 0 0 1 sin0 0 0

0

-sin0 0 cos0

i ]

.

(11) (12)

gq,,,, refers to the m and n components of the solution vector {gin,} at the degree of freedom q, and the superscript p on G~ refers to the impact site defined through the load position vector {r}. ( ~ is the frequency domain transfer function which relates displacement at degree of freedom q to load applied at degree of freedom p. One of the advantages of having the transfer function in the frequency domain is that the time derivatives of displacement can be found simply by multiplying t~p by (i~o)/ wherej is the number of time-derivatives separating displacement from the desired quantity. Once ~ is obtained, velocities and accelerations are computed as a quick post-processing step by convolving (~P with the applied load.

0 MASSIVELY PARALLEL IMPLEMENTATION

so that the element stiffness matrix in global coordinates becomes

Note that displacements along the y axis (fl, v2) and rotations about the y axis (q~l, q~2) do not change in this transformation. After combining element relationships we get the structural stiffness relation

[g~°l{am°} = {Pro.} ='m/5.{r}

(lO)

where/5 is the F F T of the applied load history and {r} is the unit vector indicating the degrees of freedom Upon which the applied loads act. The real scalars Em are Fourier cosine series coefficients describing the distribution of loads in the y direction. We use E0= 0.5 and Em = 1 for m ~> 1 to represent a delta function (for point impact). Our work uses point impacts so {r} has mostly zeros. The final time domain solution for the motion of a particular degrees of freedom q at a location y is

Both finite element and spectral element analyses of plate structures are computationally intensive. Because of the decomposition into independent behavior at each frequency and wavenumber, the spectral method is well suited to implementation on parallel processors. Here we describe how the spectral element method is implemented on MasPar MP-1 and MP-2 computers with 16 384 processors and demonstrate the near linear increase of solution speed with the number of processors. The bulk of the computational effort in the spectral element method occurs in the evaluation of the dynamic stiffness matrix [/(,,,] and the solution of [gm,]{am,} = {r} for {um~} at each m and n. The solution is time consuming not because of large [gin,] (it has a dimension of 12 for the box beam in Fig. 2) but because M and N must be large to yield accurate solutions. Our examples need about a million solutions to the assembled dynamic stiffness matrix equation. These solutions are independent of each other and may be found simultaneously. This permits us to generate a complete [K] and solve [/~m,]{Um,} = {r} for one combination of m and n in

Table 2. Variable declarations in sequential and parallel codes Sequential Parallel Variable

Dimension

Variable

w, qm kl,,, kz~. {tT,,,} [g,,, }

scalar scalar scalar scalar nDoF nDoF x nBand

{ogs}

{ rls} {kls}

{k2s } {3} [Ks]

Dimension MaxProcessors MaxProcessors MaxProcessors MaxProcessors nDoF x MaxProcessors nDoF x nBand x MaxProcessors

Massively parallel implementation of the spectral element method

379

Table 3. SPlaDyn parallel algorithm

READ

structural properties, model geometry, {r} (vector indicating the loaded degrees of freedom), /~ time and space window sizes, number of frequency and wavenumber terms to use, and MaxProcessors, the number of processors to use. COUNT over all segments: s = 1 to N × M/MaxProcessors Assign Maxprocessors values to .[oJ~}. {~h} Count over all elements: e = 1 to E Compute longitudinal stiffness matrix vJ Compute flexural stiffness matrix .[/~.,] k e~] Augment general plate element: [~,] = [/~/] & [/~d Rotate to global coordinate system [~..i = [T]r[~,][ T] Assemble into global stiffness matrix: [K~] = [K,] + [~~.~] End element loop on e Solve [/~]{fi~}= {r} for ~u~,'~ by factoring [/~] with UTDU decomposition, and forward- and back-substituting. Reshape {fit} into a partial matrix [z~,,,}, where the range of m' depends on s Multiply {film'n} by ~m'COS(q.,.y). the transform of the load distribution in y Sum {G, ] = E m. S~Um. . ~ ~,% COSO/m_V) END segment loop on s

STORE {G } POST-PROCESS {G} by convolving with a load history transform /~ and time derivatives (ioJ~)J

the 16 kbytes of m e m o r y available to one processor on an MP-1 (the MP-2 has 6 4 k b y t e s per processor). Consequently, the M P - I a n d MP-2 can find 16 384 solutions to [/~,,,]{~,,, } - ~t ~ "~~ at once, and, in the case of the box beam, all 1024 × 512 = 524 288 solutions in just 32 repetitions of the assemble a n d solve steps. The essential step of parallelizing o u r sequential F o r t r a n p r o g r a m S P l a D y n (spectral plate d y n a m i c analysis) is r e a r r a n g i n g the d a t a structures, and, t h r o u g h special compiler directives, m a p p i n g indep e n d e n t matrix e q u a t i o n s o n different processors. This eliminates the need for nested loops over m and n.

Mapping the SPlaDyn variables

processors. [/£] depends on the element physical properties a n d on the m o d e s kl and k,, which themselves depend only on co a n d r/..Lr f j.~ remains c o n s t a n t for all o) a n d q, as do the element physical properties, so these need no special declarations on the MP-I a n d -2, c o n s t a n t s are visible to all processors. Variables needing special t r e a t m e n t are therefore ~o, r/, kl, k 2, {tT} a n d [ K ] - - o u r strategy is to p r o m o t e these variables to arrays s p a n n i n g the total n u m b e r of processors, then to place one element of every p r o m o t e d variable o n t o each processor. The dimensionality of each variable in the sequential a n d parallel codes is s u m m a r i z e d in Table 2. These variables have the following similar declarations in the sequential version:

O u r goal is to generate an entire d y n a m i c stiffness matrix then solve a system o f equations with this matrix on each processor. All variables related to the generation of [/~] and the solution o f {tT} must therefore be considered for m a p p i n g o n t o the

real omega, eta c o m p l e x , 16 k l , k2 c o m p l e x , 16 u ( M a x D o F ) c o m p l e x , 16 stf(MaxDof, M a x B a n d )

Table 4. Solution times for several multiprocessor configurations and single processor computers. "Ratio to MP-I' represents how much faster the given system is relative to 1/512 of the run time on 512 processors

Machine MP-I

MP-2 Sun 4 486-66 Mhz Sun LX Sun Center 1000 RS/6000 model 580

Solution times on various systems Number of Time Ratio to processors (seconds) 1 MP- 1 512 I024 2048 4096 8192 16 384 16 384 1 l 1 2 1

619.00 311.51 158.06 79.08 39.92 19.85 8.85 11 668 5784 5349 1144 1096

512.0 1017.3 2005.0 4007.9 7940.1 15 964.7 35 809.4 27.2 54.8 59.2 277.0 289.2

Ratio to 16 k MP- 1 0.0321 0.0637 0.1256 0.2510 0.4972 1.0 2.2430 0.0017 0.0037 0.0037 0.0174 0.0181

Efficiency (%) 100.0 99.3 97.9 97.8 96.9 97.4

380

ALBERT N. DANIAL and JAMES F. DOYLE

20000

c,

/

MP-1

c o m p l e x , 1 6 u(MaxProcessors, M a x D o F ) c o m p l e x , 1 6 stf(MaxProcessors, M a x D o F , MaxBand) cmpf map omega(ALLBITS), eta(ALLBITS) u(ALLBITS, MEMORY) cmpf map cmpf map stf(ALLBITS, MEMORY, MEMORY)

/ /

o- 15000

-- 1 0 0 ° ~

/

~ 10000 ooo

ooo, , -

00~

5000 10000 15000 20000 Numberof Processors

Fig. 3. Speed-ups on the MP-1 using 512, 1024, 2048, 4096, 8192 and 16 384 processors.

T h e A L L B I T S specifier causes one term of the specified index to be placed o n a unique processor. Conversely, the M E M O R Y specifier keeps the terms o f the specified index in the local m e m o r y of a processor. Removal o f loops over m and n

while the declarations for the M a s P a r c o m p u t e r s are real omega(MaxProcessors), eta(MaxProcessors) c o m p l e x , 16 k l ( M a x P r o c e s s o r s ) , k2(MaxProcessors)

Before the t h o u s a n d s o f {zim,} are c o m p u t e d simultaneously, the correct values must be placed into the p r o m o t e d arrays so t h a t each processor operates o n unique data. Disregarding constants, {g} depends only o n [K'm,], a n d [/(,,,] only on co, a n d ~/~. W e need J

force transducer on a pendulum

/

7"

iiii

}u '

(a)

personalcomputer

al~Bi;TLPtrl°a~wi:'t~9

PCB Piezotronicsmodel 482Apower supplies

and 303A03 accelerometers

/ Norland 3001 waveformanalyzer

TektronixTM504preamplifiers

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(b)

T

Y

787.0 mm

#2 at (15.4, -16.4)nun #3 at (15.4, 28.6) mm #4 at (-46.1, -24.1) mm

X

thickness= 3.200 mm

#4

impactat (0, 0) •t

532.5 mm

~.]

X

icknss

2921o

length- 2.36 m

(c)

#2at(0, 133.2, 0)mm #3 at (0, -53.0, 0) #4 at (63.5, 0, -25.4)mm

impactat (0,0,0) 2 ',=

-' -,

mm

50.8 mm Fig. 4. Experimental set-up and plate geometries.

381

Massively parallel implementation of the spectral element method

for o9,= 5, ~ ] m = 3 6 2 , [Rm=362,n=5]and [um=362.,=5] at the first iteration. A total of N × M / M a x P r o c e s s o r s = 1 0 2 4 x 512/16 3 8 4 = 3 2

3 ~..., 2 j E

1

._o 0

< l

-3

,

,

0

Fig.

,

,

I

,

,

,

1000

,

I

,

,

,

,

2000 Time [ITS]

I

,

,

,

,

3000

5. Transverse accelerations for 51 x 127 mm box beam.

I

4000

impact

on

a

then concern ourselves only with assigning values along the distributed index of omega and eta. The entire range of co,, r/m permutations can easily exceed MaxProcessors, so the range is broken into segments, each of which is exactly MaxProcessors long. Within each segment s, values are placed in the arrays as follows:

iterations are required; processor 5483 will have the indices n = [(s)(MaxProcessors) + 5483]div N as the segment counter s goes from 1 to 32. Its m index will remain 362. At any one segment s the MP-1 and MP-2 assemble 16 384 stiffness matrices [K] and solve for the 16 384 corresponding vectors simultaneously. After [K] has been assembled and {a} solved at a segment, new values are placed in {r/s} and the assemble and solve steps are repeated. The array {09+} needs to be updated only if the segment size is not a multiple of N. The loops over m and n are replaced by a single loop over s which counts over segments, in other words, over the groups of data which can be operated on simultaneously. Since the segments are large, the iteration count for the loop over s is small. At each s we fill the arrays {a~,} and {~/s} with new values of wavenumbers (and frequencies if necessary), then simultaneously solve for the corresponding {tT+}. Pseudocode for the parallel version of SPlaDyn appears in Table 3. Steps in boldface are computed in parallel, that is, each processor simultaneously operates on the terms involving one element of variables indexed by s or m'n. These parallel operations comprise over 99.9% of the computations needed to find the spectral transfer function (~P.

Performance results {Ogs} =

{~s}

{O)1,

={~0,

O92 " " " ( ' 0 N , ( D I ,

~0"

o92 " " " OgN, O91, 602''

" " 170, ~ ] 1 , ~ 1 " " " ~ 1 ,

~m',

~m'"

"('ON}

" " ~/m')

where N x (m' + 1) = MaxProcessors. Corresponding to the array of frequencies {cos} and array of wavenumbers {q+} is an array of stiffness matrices JR,] and an array of generalized nodal unknowns {tT+} where

[L] = {[R,. =o..=,1, [~,.~o.. =,] ....

. . . . .

[era =o,. = N], TI efficiency = NTN x 100%,

[ g m ~ m',n = N]}

{/ds} = {{/~mffi0,n= 1 }, {/~mffi0, n=2} .....

{UrafO,n=N},

. . . . {fim=m'.~=N}}" All terms of a single s subscript reside on a single processor. F o r example, if N = 1024, M = 512 and all 16 384 processors were used, processor 5483 would contain the terms n = 5483 d i v N =

5483 div 1024 = 5

m = 5483 m o d nN = 5483 -- 5(1024) = 362 CSE 5 / 4 ~

D

We evaluate the merits of our parallel solution scheme by measuring execution times needed to generate the frequency response {~} at one point on the box beam structure shown in Fig. 2 on the MP-1 using varying numbers of processors, and on some c o m m o n workstations. The portion of the code which is timed is the computation of the spectral transfer function GP. A summary of the timing results is given in Table 4. We define parallel efficiency using N processors as T512

Tt = 512 "

The technically correct definition has T l equal to the solution time on a single processor but this measure is highly impractical and wasteful of resources on S I M D computers; the estimated C P U t i m e for our test problem on one MP-1 processor is 3.17 × l0 5 s - more than 3.6 days. The most remarkable result is not the fast solution times but the virtually linear increase of performance with the number of processors. Figure 3 shows how solution speed (measured as the inverse of solution time) varies with the number of processors. The linearity of this relationship vividly illustrates how

382

ALBERT N. DAN1ALand JAMESF. DOYLE

120

- ....... .............

experimental recon(#2) recon(#3)

......

recon(#4)

fr-~ '

~'.,.

,'.;', ,',.',, .,'1

ti ",.,,.i J

-40

,

,

,

,

I

1000 Time

,

i

,

i

I

2000 [ps]

=

i

,

,

/

4000

3000

120

experimental .......

recon(#2#3) recon(#2#4) recon(#3#4)

.............

......

• The n u m b e r of tasks is enormous. • The tasks are data-independent. • The tasks are equally sized and require identical operations. • The individual tasks need small amounts of memory.

40

o

C

-4(

problem is subdivided, the more processors that can be used. Second, that we are superimposing behavior at different frequencies--linear structural behavior is independent at each frequency, so the problems are all data-independent of one another and need not share information during the solution phase. In addition, the spectral method itself has two additional properties which contribute to the method's effectiveness on multiple processors. The method gives exact solutions over large regions so that few elements are needed to model large structures. The computer memory requirements are therefore also small. Also, the task of generating the dynamic stiffness matrix and solving a system of equations with this matrix is identical at each frequency. Load balancing across processors is therefore not an issue, since each processor performs the identical operations. The spectral element method therefore reaps the benefits of massively parallel computation for the four reasons summarized below:

I

,

J

,

,

I

,

J

i

d

,

,

i

i

I

2000

1000

i

,

i

,

3000

40100

Time [ITS]

120

-

-

experimental

recon(#2#3#4)

. . . . . . .

By 'task' we refer to the creation of [/~] and the solution of [/~]{~} = {r} at each co and 1/. Having demonstrated the ability to compute forward solutions (and spectral transfer functions) rapidly we now describe how many of these solutions may be used to infer u n k n o w n parameters describing the structure or the applied load. The method relies on the ability to reconstruct forces from responses through the transfer functions, so this procedure is discussed first. I N F O R M A T I O N

-40

.

.

.

.

I

,

1000

,

,

,

I

2000 Time ~ ]

,

,

,

b

I

,

,

3000

I

,

1

4000

Fig. 6. Reconstructed forces for impact of a 51 x 127mm box beam. the spectral element method is eminently suited to multiprocessor computers. D I S C U S S I O N

O F

P A R A L L E L

I M P L E M E N T A T I O N

The reason spectral methods are so well suited for parallel processing lies in the m a n n e r in which the structural dynamics problems is decomposed into smaller problems. Spectral analysis takes the view that a structure's dynamic response is a superposition of response at m a n y separate frequencies. This statement implies two points vital to parallel processing: first that there are many frequencies--the more a

R E L O C A T I O N

In the inverse solution method described later we take measurements at different locations and extract the required information essentially by comparing these measurements. In order to compare the measurements in a c o m m o n basis we first relocate the information from the sensor's positions to a single location. Force identification is the process of determining the force history of an applied load from response information such as strain or acceleration histories. The literature on force identification on impacted plates describes methods primarily limited to simple plate geometries, l~13 or simple load conditions. 14 With the spectral element method, however, we are able to perform force identification on complicated structures subject to arbitrary loads. Force identification interests us because it is a convenient means to relocate the sensor information. Two structures are considered: a large fiat alumin u m plate, and a thin-walled a l u m i n u m box beam

Massively parallel implementation of the spectral element method

383

Fig. 7. Contours of score for impact near a free edge.

comprising four plates joined at 90 °. Figure 4 shows the equipment used to collect acceleration and force data from the plate structures. A force transducer with a hemispherical contact surface is mounted on a pendulum so that it strikes the structures at the lowest part of the pendulum's trajectory. The flat plate is supported vertically by two clamps along the bottom edge and has the other edges free, while the box beam is clamped at its ends. Three accelerometers record transverse acceleration. Their signals are preconditioned by pre-amplifiers and recorded with Norland 3001 waveform analyzer. Up to 4096 data points are measured at a rate of two or five micro-

seconds, and a computer receives the digitized signals via an RS-232 link.

Force reconstruction by deconvolution The spectral method allows force reconstruction problems to be tackled directly because the underlying deconvolution is performed in the frequency domain and hence is merely an algebraic division at each frequency. For example, using Eq. (12) we can express the acceleration at point q in terms of a load at point p as

dq., = (uo,)~Gu,,P,,.

(13)

384

ALBERT N. DANIALand JAMESF. DOYLE

Fig. 8. Contours of score for impact of 51 x 127mm (2 x 5") box beam.

Deconvolving the force t3 from a known acceleration at q simply involves inverting Eq. (13)

~lq,n

t5,, = (i°9,,)ZGPq,,,"

(14)

In practice however, direct applications of Eq. (14) may lead to poor force reconstructions. The reasons are two-fold: • The deconvolution of force from acceleration implicitly performs an integration since, for plate structures, applied forces and acceleration responses are related by approximately one time derivative. It can be shown analytically that for an infinite plate with no damping, the applied force history is directly proportional to the velocity response at the impact site. Therefore, if we were able to transfer acceleration at a

remote location (such as the sensor's position) to the impact site, we would still have insufficient knowledge to reconstruct the force completely. Note that strain gauge data on the other hand involves no time derivatives of plate motion and can therefore yield force history directly as done in Refs 11, 15 A spectral transfer function (~Pqcould have zeros at certain frequencies making some force components singular, or it could have large peaks which make some force components very susceptible to errors in the measured response.

Stable force reconstruction Rather than directly apply Eq. (14) to recreate forces, we use a modified form of this equation to compute force rates Rn. This equation is much less likely to have numerical problems:

Massively parallel implementation of the spectral element method

FalnGln+a2,G2,+ +aqnGqn7 X -" . . . . . -~--~-'~'---~-'~;T. . . . z.~.:2~"

L

J (15)

where ~GP* ito the complex conjugate of (~{,, and q,n p p* Gqp 2 -=- GqGq . The real scalar N is random noise which varies between 0 and ~'~max at each frequency. Reconstructions based on Eq. (15) are numerically more stable than Eq. (14) because multiple sensors tend to have zeros at different frequencies, thus no individual sensor needs to carry the burden of providing information across a full frequency range. In addition, the sum of transfer function magnitudes and the addition of positive-only noise virtually guarantees that the denominator of Eq. (15) will never be zero. Additionally, we are free to reconstruct at any time derivative of data (via the exponent j ) since they all contain essentially the same information. Force identification on the box beam

To illustrate the types of force identification solutions possible with the spectral element method we present responses and reconstructed forces on the thin-walled box beam. Figure 5 shows flexural acceleration responses at three locations on the 51 m m × 127mm (2" × 5") thin-walled box beam. The locations of the accelerometers and the impact site are given in Fig. 4c. These responses are typical in that they show the complicated interaction of flexural wave dispersion, transmission, and reflection which occur in plate structures. Superimposed on each experimental response is the spectrally computed forward solution using the experimentally measured force history. The traces agree well, especially for the early portion of the traces (1200-2000/is). It can be shown with throw-off spectral elements that the flexural waves which originate at the impact site propagate around the perimeter of the box beam several times in the span of these 800 p s. Force reconstructions computed in Eq. (15) using individual, pair-wise and triple combinations of the three experimentally recorded accelerations are shown in Fig. 6. The clear trend is that the quality of reconstructions improves as more sensors are combined. More importantly, this figure shows the disparate information of Fig. 5 relocated to a c o m m o n basis.

FORCE

LOCATION

SOLUTIONS

The inverse problem of force identification is relatively straight forward with the spectral element method, and carries the same computational cost as the forward problem. Other inverse solutions such as determining the location of impact or identifying a structural property such as a flaw are much more

385

computationally intensive because an iterative approach is required. We demonstrate our method for finding the (x, y) of an impact site in the case of the flat plate, and the (x, y, z) of impact on the box beam. In simple terms, the method begins with random guesses for the unknown parameters. Spectral transfer functions based on the experimentally measured dynamic response from each sensor are used to reconstruct a force history. The separate force histories are then compared to each o t h e ~ i f they are in close agreement, the values of the unknown parameters must be accurate. Comparison of.force rate reconstructions

A primary issue in force location based on comparing force (or force rate) reconstructions is how to quantify the comparison. The purpose of the comparison is to indicate how close the guess for the impact site k, is to the actual location of impact. In this work the comparisons are quantified in a single parameter called 'score'. This parameter should • have a global maximum or minimum value at the impact site, • deteriorate proportionately as distance from the impact site increases. There are many ways to measure how well several force reconstructions agree with one another. Here we use a scheme which assigns scores so that disagreement between functions is penalized. The scoring function is diff(/~,,/~,) -- k x/(/~,., -/~j,,) × ([¢,,,,- R,.,,)"* n= 1

(16) score = diff(/?,,/?j) + diff(R, /~k) + diff(Rj, kk ). (17) Again, the asterisk denotes a complex conjugate. In this case, when the force reconstructions from each of the three sensors are in agreement, the overall differences are small and the impact location problem becomes a search for the minimum score. Figure 7 shows contours of score on a 300 x 220 mm area of the flat plate near the right free edge. Impact occurs at a point 58.7 m m from the edge. The impact site defines the origin of our coordinate system so the sensors are located at ( 2 8 . 7 , - 3 7 . 2 ) m m , (28.7, 49.3)mm and ( - 8 4 . 0 , 49.3) ram. The entire 300 x 2 2 0 m m region is discretized in a regular grid having a spacing of 5 mm (0.2") and contains 2684 grid points. Each point represents score based on force rate reconstructions from three sensors, so more than 8000 force identification solutions are represented. The location of minimum score is at (10, 5)mm, approximately one centimeter from the experimentally measured impact site. Note that on the

386

ALBERTN. DANIALand JAMESF. DOYLE mum score is difficult to see in this contour plot. The global minimum score is at (0, - 5, 0) mm, just 5 mm from the measured impact site at (0, 0, 0). The search space of 250mm~
semi-infinite plate, scores have global minima along the free edge itself. This is expected since the boundary conditions have a traction-free edge there, in other words, the force along the edge is zero and therefore the differences will be zero. The plot shows scores ending 15 mm from the edge, and not actual edge values. The MasPar MP-2 takes approximately 20 rain to generate the scores. Table 4 indicates that the RS/6000 would spend more than 41 h on this problem. Clearly, the parallel reformulation of our governing equations and the method's impelementation on a large multiprocessor computer enable problems to be solved which are otherwise impractical. Contours of score on the 51 × 127 mm (2" × 5") box beam are shown in Fig. 8. The figure shows the four faces of the box beam in two exploded sections to allow simultaneous viewing of scores on all surfaces. The lower 'L' section represents the bottom and back faces of the beam while the upper inverted 'L' represents the top and front faces. Due to the large variation of scores, the point of global mini-

L

L

---;~" .....

.....

;/ ..... I

L

.....

;,'- .....

-2 "" .... I

t

.....

-,, . . . .

-/-

Automatic methods to f i n d minimum score

The computational demands of generating contour plots of scores based on comparisons of force reconstruction are high. The purpose of quantifying comparisons into a single score parameter for each guess to ~p is to find the optimum score and hence the correct impact site--the full contours need not be generated if a better method to finding the optimum value is available.

/

- -/

I S

-2"" .....

-S- - - -/

generations

." "7---~y /

....

;i ..... r

-,,_ . . . . . d

_,'. _ _ _ / I S

Fig. 9. Convergence of the genetic solution to the impact site for impact near a free edge. Every fourth generation is shown.

Massively parallel implementation of the spectral element method The main difficulty in searching for minimum scores computed with the difference method (Eq. 17) is that many local minima exist, precluding search methods such as simple implementations of steepest descent. 16 In this section we discuss the relative merits of two search schemes, brute force and the genetic algorithm. The two contour plots are generated with the brute force m e t h o d - - t h e search domain is discretized into a regular mesh, and each grid point is evaluated as a possible impact location. The maximum or minimum score is found simply by traversing the list of scores while keeping track of the optimum then cross-indexing this to the grid coordinates. The scheme has two strong points: if the mesh is fine enough, and if the search domain contains the global maximum or minimum, the method cannot fail to locate the optim u m value. In addition, the method is entirely parallel since each grid point can be scored independently of other points. Thus, if there were to exist a computer with 2684 x 1024 x 512 ~- 1.4 x 10 9 processors, the impact location problem on the infinite plate could be found in a single iteration containing every [Km,,]~Um° ~ = {r} solution at each grid point. The weaknesses of the brute force method are that it is difficul to a priori set the required grid spacing; that the position of the impact will be determined only to within a half-grid unit; and that the method is computationally expensive compared to non-exhaustive search schemes. The genetic algorithm introduced by Holland ~7 is a stochastic optimization method which begins with a set of random guesses to the solution. The guesses are referred to as individuals, and the set is called a population. After individuals are selected, they are scored according to how well they solve the problem. Characteristics of the best-scoring individuals are combined to produce a new population. After several generations the population tends to drift to regions of better score. Aside from the scoring method which is problemdependent, three additional criteria must be established. One must decide 1. how the parameters to be optimized are encoded in an individual's genes, 2. how parents are chosen for future generations, 3. how offspring are produced from parents. Complete details of these and other issues involved in our implementation of the genetic search method can be found in Ref. 18. To demonstrate the economy of the method, consider Fig. 9 which shows a population of 30 individuals converging to the correct impact location on the flat plate impacted near an edge in just 24 generations. The total number of points considered is therefore 720, nearly four times fewer than the brute force scheme on the regular grid. In addition, the genetic search converged to an impact location of (9.7, 3.8)mm which is more accurate than the grid solution. It is interesting to note the presence of genetic mutations at the upper-most search plane.

387

DISCUSSION Elastic waves propagating through a structure contain information about the structure and the loads acting on it. We have shown that information about the impact (its location and force history) can be extracted accurately using flexural acceleration data from multiple sensors. Other parameters relating to structural properties can be identified with the methods described here as well. Our general approach to the inverse problem, however, is feasible only by reformulating the governing equations in a highly parallel manner and implementing the solution scheme on powerful multiprocessor computers such as the MP-1 and MP-2. A signficant benefit of the ensuing computational performance is that it allows us to experiment with many variations of the algorithm as well to test limiting cases. Such experimentation would be impractical with conventional approaches to structural dynamic inverse problems. Some of the variations being explored are • Defining a heuristic for a 'reasonable" search domain that encompasses the space around and between the sensors. • Determining the number of sensors, and the sensor's relative positions, sufficient to uniquely obtain the location of any impact inside the search domain. In addition, we hope to solve other interesting inverse problems on plate structures such as identifying cracks and delamination. Acknowledgements The MasPar MP-1 at Purdue University's Parallel Processing Laboratory is supported in part by NSF Parallel Infrastructure Grant ~CDA-9015696. We thank Dr. Hank Deitz from Purdue's School of Electrical Engineering for making this computer available to us. and Dr. Will Cohen for assistence with the initial programming. The MasPar MP-2 used in this research was made available by the MasPar Computer Corporation as part of the 1994 MasPar Challenge contest. This research was funded in part by the Purdue Research Foundation. REFERENCES

1. C. J. Wiernicki, "Damage of ship plating due to wave impact loads," Proc. 1lth Ship Technology & Research Symp., Portland, OR, USA, pp. 151 179 (1986). 2. M. Rodriguez and C. Diaz, "'Analysis of the seismic performance of a medium rise, waffle flat plate building," Earthquake Spectra 5(1), 25 40 (1989). 3. K. H. Chu and E. Dudnik, "Concrete box girder bridges analysed as folded plates," Concrete Bridge Design SP-23, 221-246 (1969). 4. K.-J. Bathe, Finite Element Procedures in Engineering Analysis, Prentice-Hall, Englewood Cliffs, 1982. 5. S. A. Rizzi and J. F. Doyle, "A spectral element approach to wave motion in layered solids," J. Vibration Acoustics 114, 569 577 (1992). 6. J. F. Doyle, Wave Propagation in Structures, SpringerVerlag, New York, 1989. 7. LI. G. Chambers, "Sommerfeld type diffraction problems for a plate," Z. Angew. Math. Mech. B42, 545 555 (1962).

388

ALBERT N. DANIAL and JAMES F. DOYLE

8. I. N. Sneddon, Fourier Transforms, McGraw-Hill, New York, 1951. 9. A. N. Danial, S. A. Rizzi and J. F. Doyle, "Dynamic analysis of folded plate structures," J. Vibration Acoustics (in press). 10. J. E. Michaels and Y. H. Pao, "Inverse source problem for an oblique force on an elastic plate," J. Acoustical Soc. America, 77, 2005-2011 (1985). 11. J. F. Doyle, "Experientally determining the contact force during the transverse impact of orthotropic plates," J. Sound Vibration, 441-448 (1987). 12. J. D'Cruz, J. D. C. Crisp and T. G. Ryall, "Determining a dynamic force on a plate. A non-linear inverse problem," Vibration and Noise-Measurement Prediction and Control National Conference Publication--Institution of Engineers, Australia, No. 90, Part 9, pp. 218-221, 1990.

13. J. D'Cruz, J. D. C. Crisp and T. G. Ryall, "On the identification of a harmonic force on an viscoelastic plate from response data," J. Appl. Mech. 59(4), 722-729, Dec. 1992. 14. C. Chang and W. Sachse, "Analysis of elastic wave signals from anextended source on a plate," J. Acoustical Soc. America, 77(4), 1335-1341, 0985). 15. J. F. Doyle, "Determining the contact force during the transverse impact of plates," Exp. Mech. 27, 68-72 (1987). / 16. W. H. Press, S. A. Teukolsky, W. T. Vetteding and B. P. Flannery, Numerical Recipies in Fortran: The Art of Scientific Computing, Cambridge University Press, 1992. 17. J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975. 18. A. N. Danial, Inverse Solutions on Folded Plate Structures, Ph.D. Thesis, Purdue University, 1994.