Development of parallel 3D RKPM meshless bulk forming simulation system

Advances in Engineering Software 38 (2007) 87–101 www.elsevier.com/locate/advengsoft Development of parallel 3D RKPM meshless bulk forming simulation...

Download PDF

2MB Sizes 1 Downloads 65 Views

Report

PDF Reader
Full Text

Advances in Engineering Software 38 (2007) 87–101 www.elsevier.com/locate/advengsoft

Development of parallel 3D RKPM meshless bulk forming simulation system Wang H. *, Li Guangyao, Han X., Zhong Zhi Hua The State Key Laboratory of Advanced Technology for Vehicle Design and Manufacture, College of Mechanical and Automotive Engineering, Hunan University, China Received 21 November 2005; received in revised form 25 July 2006; accepted 9 August 2006 Available online 22 September 2006

Abstract A parallel computational implementation of modern meshless system is presented for explicit for 3D bulk forming simulation problems. The system is implemented by reproducing kernel particle method. Aspects of a coarse grain parallel paradigm—domain decompose method—are detailed for a Lagrangian formulation using model partitioning. Integration cells are uniquely assigned on each process element and particles are overlap in boundary zones. Partitioning scheme multilevel recursive spectrum bisection approach is applied. The parallel contact search algorithm is also presented. Explicit message passing interface statements are used for all communication among partitions on diﬀerent processors. The parallel 3D system is developed and implemented into 3D bulk metal forming problems, and the simulation results demonstrated the eﬃciency of the developed parallel reproducing kernel particle method system. 2006 Elsevier Ltd. All rights reserved. Keywords: Parallel; Reproducing kernel particle method; Bulk forming; Domain decomposition method; Partition; Message passing interface; Speedup

1. Introduction The non-linear ﬁnite element formulations for non-linearity of geometric and material have been well developed and a lot of signiﬁcant works have been completed in bulk forming analysis. Nevertheless, FEMs are still ineﬀective in dealing with some extreme material distortions owing to severe mesh distortion. There are diﬃculties in solving problems involving large deﬂections and moving discontinuities. Typical problems include extremely large deformations in manufacturing processes, the propagation of interface between solids and liquids in casting, the propaAbbreviations: RKPM, reproducing kernel particle method; EFG, element free Galerkin method; PIM, point interpolation method; SPH, smoothed particle hydrodynamics; FEM, ﬁnite element method; RSB, recursive spectrum bisection; MRSB, multilevel recursive spectrum bisection; DDM, domain decomposition method; MPI, message passing interface; EBC, essential boundary condition; SYMMLQ, sparse symmetric equations; PE, process element. * Corresponding author. Tel.: +86 731 4329 266; fax: +86 731 8821 445. E-mail address: [email protected] (H. Wang). 0965-9978/$ - see front matter 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.advengsoft.2006.08.002

gation of cracks with arbitrary and complex paths in failure, and the tracking of the growth of phase boundaries and microcracking in advanced materials development. Some of FEMs require remeshing models in large deformation conditions; it still requires considerable computational eﬀorts. Meshless methods eliminate mesh distortion for both large shape design changes and large deformation non-linear analysis. A variety of meshless modeling methods have recently emerged [1]. RKPM was proposed by Liu et al. [1–5,39] to improve the accuracy of the SPH [27–29,31,32] method for ﬁnite domain problems. These methods have advantages over traditional FEMs for their ability to handle large deformation problems without mesh distortion, and for their solution accuracy due to the large domain of inﬂuence covered by particles/nodes. In this method, the kernel function is modiﬁed by introducing a correction function to meet the reproducing conditions. The resulting modiﬁed kernel function exactly reproduces polynomials to a speciﬁc order and thereby fulﬁlls the completeness requirement. The shape functions developed from the method were later

88

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

proven to be equivalent to moving least-squares kernel interpolates if linear basis functions were used [1–3]. Liu et al. also introduced wavelets [5] as the kernel functions and successfully applied RKPM to multiple scale analysis. More recently, some eﬀort has also been devoted to the solution of bulk metal forming processes by means of meshless approaches. Referenced papers encompass the following operations: ring compression [6,7], upsetting [6– 13] and extrusion [6,7,10,14,15]. Meshless methods encounter the major technical barrier-low computation eﬃciency. Because meshless methods are based on the high order interpolation, the computation time of meshless requires much longer than FEMs. By taking a general view to the content of these papers it appears that major results are generally limited to 2D problems and simple 3D problems. Therefore, meshless methods are not wildly applied in practice. Parallelization of meshless codes is the way to settle expense of meshless computations. The parallel meshless methods to decompose the domain into several subdomains to solve are called DDM. DDMs are useful in two contexts. First, the division of problems into smaller problems through usually artiﬁcial subdivisions of the domain is a means for introducing parallelism into a problem. In this manner, problems that are intractable on serial computers can be solved on parallel computers. Second, many problems involve more than one mathematical model, each posed on a diﬀerent domain, so that domain decomposition occurs naturally. DDMs are wildly used for explicit FEM problems [42,43,45]. It is natural that DDMs [44] can advance the meshless computation eﬃciency easily. The present treatment will focus on the RKPM for explicit dynamic analysis of bulk metal forming problems, but the procedures can directly apply to some other meshless methods (e.g. EFG [1,30], PIM [16]) without diﬃculties. Several distinct advantages of RKPM are its ability to accurately model extremely large deformations without mesh distortion problems and its ease to adaptive modeling by simply changing particle deﬁnitions for desired reﬁnement regions. In this paper, an overview of a Lagrangian RKPM for non-linear explicit dynamic analysis is ﬁrst given. A general description of the parallel implementation is described. The parallel procedure primarily consists of a mesh partitioning pre-analysis phase, a parallel analysis phase that includes explicit message passing among partitions on separate processors. In the ﬁnial, the numerical examples have been presented to demonstrate the eﬃcient parallel 3D RKPM system.

2. Weak form and discretization of RKPM for contact problems The reproducing kernel approximation [1–5] of a function u(X) in a domain Xx is expressed

ua ðX Þ ¼

Z

Ua ðX Y ÞuðY ÞdXx ;

ð1Þ

Xx

where ua(X) is the reproduced of function u(X), and Ua is the window or kernel function with compact support. In the following X represents the material coordinates, x is the spatial coordinate and u is the displacement of the particles and t denotes the time. Discretizing the domain Xx by a set of particles {X1, X2, . . . , XNP}, where NP is the total number of particles, the integral is approximated by the following summation: uðX ; tÞ ¼

NP X

N I ðX ÞuI ðtÞ;

ð2Þ

I¼1

where NI(X) is the Lagrangian shape function deﬁned by N I ðX Þ ¼ CðX ; X X I ÞUa ðX X I ÞDV I :

ð3Þ

C(X; X XI) is the correction function and DVI is the volume of particle I. The more details on the construction of the shape function of RKPM can be found in Refs. [1–5]. For an approximation with the virtual work principle, the essential boundary conditions (EBCs) must be satisﬁed directly by the interpolation functions or accommodated by augmenting the variational statement with constraints. A major diﬀerence between RKPM and other methods (e.g. FEMs) is the manner in which EBCs can be enforced directly. The non-local interpolation condition of equation poses an additional computational challenge. Whereas EBCs for ﬁnite elements are imposed locally at particles (because they possess the Kronecker delta property), The EBCs enforcement with RKPM are non-local over a patch of particles/nodes. In some cases, the EBCs can be adequately approximated by local speciﬁcation at the particles (assuming a Kronecker delta property). This approximation can be accurate, by St. Venant’s Principle, when the primary regions of interest are away from the EBCs. In general, however, a coupled set of equations is usually solved, even for explicit analyses. Previous eﬀorts used Lagrange multipliers to constrain the variational statement [17] or a set of simultaneous equations is directly solved. In either case, signiﬁcant computations were generally necessary to enforce the EBCs. These procedures also are not well suited for parallel processing, since they must generally be made over multiple processors. Kent [18] proposes an alternate approach that may also require signiﬁcant computational eﬀort, as it is algebraically equivalent to other existing equations solving methods. By treating the imposition as a transformation of the interpolation functions, however, this form is better for parallel processing. Describing the EBC equations by gB ¼

NP X J ¼1

N J ðX B ÞuJ ;

ð4Þ

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

89

where gB is the speciﬁed EBC at XB or can be an integral relation along XB. It is written in matrix form for all conditions

present normal contacting force and frictional force, respectively. The principle of virtual work can be written as

GT d ¼ g;

dW int dW ext dW kin dW c ¼ 0:

ð5Þ

ð16Þ

where by Gram–Schmidt or Householder orthogonalization

Substituting the particle approximations (2) and (11) into (16) the discretizing equation is obtained as

GT G ¼ I;

mI €uI ¼ fIext þ fIcont fIint ; Z Z ext fI ¼ uI b dX þ uIt dC; X0 Ct Z int rX uI P dX; fI ¼

JG ¼ 0;

J T J ¼ 1:

ð6Þ

The generalized variables are represented by d ¼ J^ u þ Gg:

ð7Þ

And we can obtain ð8Þ

ouðX Þ oN T ðX Þ oN T ðX Þ ¼ J^ uþ Gg: oX oX oX

ð9Þ

Note that the constraint conditions are recovered. That is ð10Þ

Since the new transformed interpolation functions of equation satisfy the EBCs, they can be directly applied in the virtual work statement. The alternate displacement vector ^u, is now determined with the EBCs directly speciﬁed in g. The orthogonalization in Eq. (6) may require a signiﬁcant number of computer operations. These computations and the creation of the alternate interpolation functions of Eqs. (8) and (9), however, are done only once in the preprocessing phase. Transient displacement conditions are then accommodated by only changing the values in g. The gradient of displacement is rX u ¼

NP X

rX N I ðX ÞuI :

ð18Þ ð19Þ

X0

u þ N T ðX ÞGg; uðX Þ ¼ N T ðX ÞJ ^

GT d ¼ GT ðJ ^ u þ GgÞ ¼ g:

ð17Þ

ð11Þ

I¼1

The principle of the virtual work is used to formulate the general contact problems. The virtual work done by a stress ﬁeld is Z rX du : P dX: ð12Þ dW int ¼ X0

The virtual work done by the external loads, the inertial force and the contact traction through the virtual displacement ﬁeld are denoted by dWext, dWkin and dWc, and are calculated as Z Z ext dW ¼ du b dX þ du t dC; ð13Þ X0 Ct Z q0 du € u dX; ð14Þ dW kin ¼ X Z 0 ðqN dcN þ qT dcT ÞdC; ð15Þ dW c ¼ Cc

where P is the nominal stress tensor, q0 is the density in the reference conﬁguration, t is the traction force, b is the body force, cN and cT denote the relative displacement in normal and tangent direction at the contact interface Cc, qN and qT

where mI denotes the mass of particle I, fcont is the contact force whose details are described in Refs. [29,30]. 3. Central diﬀerence algorithms For explicit time integration methods, central diﬀerence algorithm [30] is popularly being used and it can be obtained by writing down the following central diﬀerence expressions for velocity vn and acceleration an and vectors vn ¼ ðd nþ1 d n1 Þ=ð2DtÞ;

ð20Þ 2

an ¼ ðd nþ1 2d n þ d n1 Þ=ðDt Þ:

ð21Þ

Substituting these expressions in the dynamic equilibrium equations, and rearranging terms it can be written as 1

d nþ1 ¼ ðM þ C2DtÞ ½Dt2 ðpn1 fn1 Þ þ 2Md n ðM CDt2Þd nþ1 ;

ð22Þ

where p is the internal force vector. This is a recurrence relation in dn. If both M and C are diagonal matrices of mass and damp, the solution for dn+1 does not require factorization. Since the central diﬀerence algorithm does not require assembly of global matrices and can be handled at degrees of freedom level, the parallel implementation is pretty straightforward. The interprocessor communication requirement per time step is very small when compared to the implicit algorithms. So the DDM parallel method based on MPI is the best choice for explicit algorithms. 4. The explicit parallel algorithm for RKPM development As FEM explicit parallel algorithms, separate pre-analysis software is created to partition any general unstructured RKPM models. Similar to explicit FEMs [33–35], in this section, DDM is implemented for parallel 3D RKPM system. We mainly focus on the partition methods and data structures for complex communication among subdomains. 4.1. Partition 4.1.1. Partition scheme for meshless methods For eﬀective parallel computing, it is critical to balance the computational load among processors while minimizing

90

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

interprocessor communication. If tasks are not distributed in a balanced way, system may end up waiting for one task to complete while other tasks are idle. Therefore, separate pre-analysis software is created to partition any general unstructured RKPM models. Similar to explicit ﬁnite element parallel algorithms, it requires partitioning RKPM model ﬁrst. Unlike the FEM, The shape function of meshless is interpolated in dynamics domain according to dilation parameter a. The integration point of meshless is not limited inside of the support domain. Program need to keep all integration and interpolation information in each process elements (PE) entirely. There are two strategies for partition process, partition based on particles and partition based on integration cells. The integration point partition is used in this work due to two advantages: 1. When particles based partition strategy is used, particle overlap domains are generated. And then, we should assign integration cells to each subdomain according to particles. As we know, unlike FEMs, particles may not belong to only one integration cell, thus, overlap integration cell domains should be generated. We have to design data structures for both particles and integration cells, respectively, therefore, the complexity and storage size are both enlarged a lot. On the other hand, if the integration cells based partition strategy is implemented, integration cells should be ﬁrst partitioned and no cell overlap subdomain should be generated. We only assign particles to relevant subdomains according to integration cells. 2. We can directly use FE partition methods for integration cells instead of reconstructing connected relationship with discrete particles of global domain. 4.1.2. Multilevel approach for integration cells Like FEMs, integration cells partition problems can be easily converted to graph partition problems [41,46]. So we discuss graph partition approaches to present the procedure for integration cells of RKPM method. The RSB [19] algorithm is one of a class of recursive bisection methods for partitioning unstructured problems. The most straightforward implementation of RSB, which uses the Lanczos algorithm [20] to ﬁnd the Fiedler vectors, is unacceptably slow for many applications. The MRSB [21] is a reﬁnement of the algorithm that is much faster, typically by an order of magnitude or more, and has been instrumental in the acceptance of RSB. The basic idea behind MRSB is to speed up the Fiedler-vector computation by constructing a series of successively smaller contracted graphs that maintain the global structure of the original graph. An algorithm based on the multilevel approach normally comprises three phases. 1. Coarsening phase: the original graph G is reduced into a series of successively coarser graphs. There are several ways to coarsen a graph. In the maximal independent set of a graph is chosen as the vertices of

the coarse graph. An independent set of the graph G is a set of its vertices, with no two of the vertices in the set connected by an edge of G. An independent set is a maximal independent set if the addition of an extra vertex makes it no longer independent. The most popular method for generating the multilevel of coarse graphs is based on edge collapsing. Selected edges are collapsed so that two vertices connected by one of these edges form a multi-node. Each vertex of the resulting coarse graph has a weight associated with it, which indicates the number of original vertices it contains. Each edge of the coarse graph also has a weight associated with it that indicates the number of original edges it represents. This edge collapsing approach has the distinctive advantage that the edge-cut (generalized here as the sum of the edge weights cut by the partitioning) on the coarsest graph equals the edge-cut of the ﬁnest (original) graph, if the partitioning on the coarsest graph are inherited by the ﬁner graphs without further reﬁnement. The edges are usually selected using the idea of matching. A matching of a graph is a set of edges, such that no two of them share the same vertex. The maximal match is a matching of the largest size. In a heavy edge matching [22], vertices are visited randomly. For each vertex, the heaviest unmatched edge from the vertex is selected. Heavy edge matching has the advantage that the resulting coarse graph has a relatively small total edge weight, and therefore the partitioning of it is more likely to give a small edge-cut. As an alternative to edge collapsing based on maximal matching, a more aggressive coarsening approach has been employed [23], where the greedy algorithm is used to form small clusters of, say, 10 vertices for collapsing. The coarsening process is applied until the graph has less than a preset minimum number of vertices. 2. Partitioning phase: partition the coarsest graph Gl into n parts. As the coarsest graph is of very small size, any partitioning algorithms will be able to partition it rapidly. The partition is executed in a manner such that the summation of the vertex weights of each subgraph should be roughly the same. When edge collapsing approach is used for graph coarsening, the edge-cut of the partitioning is the same as that for the ﬁnest graph, should the partitioning be inherited all the way to the ﬁnest graph without reﬁnement. The choice of partitioning algorithms at this coarsest level is not very important [22,24] to the overall quality of the partition because reﬁnement will be carried out at the subsequent ﬁner levels of the graph. 3. Uncoarsening and reﬁnement phase: the partitioning of Gl is interpolated to Gl1 and reﬁned, the process repeated and terminated at G. During the uncoarsening and reﬁnement phase, the partitioning of a graph Gl is inherited by the ﬁner graph Gl1. For the multilevel recursive spectral bisection (MRSB) algorithm [21], the eigenvector for the coarse graph Gl is interpolated to the graph Gl1 as the ﬁrst approximation

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

of the eigenvector for Gl1. From there, instead of using the Lanczos algorithm to recalculate the eigenvector, Rayleigh iterations are used to improve the approximation. The resulting positive semi-deﬁnite linear systems are solved using sparse symmetric equations (SYMMLQ) [22]. Because a good initial approximation exists, the number of matrix–vector products involved in ﬁnding the Fiedler vector using the Rayleigh quotient algorithm is normally no more than ten, as opposed to up to several hundred if the Lanczos algorithm is used. Thus the MRSB algorithm is about an order of magnitude faster than the single level Lanczos implementation of RSB. But there are two disadvantages to the multilevel spectral bisection approach. First, this is still a bisection algorithm and the partitioning of a graph into many subdomains would involve recursive use of the algorithm. Second, even though the Rayleigh quotient procedure is substantially faster than the Lanczos algorithm, each iteration is still of the order O(n), where n is the number of vertices in the graph. However, all that may be needed on the ﬁner graph Gl1 is some reﬁnement of the subdomain boundary to reduce edge-cut and to maintain the load balance. For a large graph with a good existing partition, the size of the boundary should be only a small fraction of that of the interior. Therefore, multilevel algorithms that adopt Kernighan–Lin (K–L) [26] type algorithms during the uncoarsening and reﬁnement phase have proved themselves to be superior to MRSB [23–25]. The boundary reﬁnement approach is not restricted to bisection and n-way reﬁnement to the existing partition could be used. In [25] vertices of any subdomain are allowed to move to any of the other n 1 subdomains, while in [23] boundary vertices are allowed to move to the neighboring subdomains only. 4.1.3. Comparison between FE and meshless partition results Meshless methods use the particle values in the local support domain to interpolate the shape functions. The RKPM approach needs about 20–30 particles in a support domain to construct the shape functions in this work, and the area of overlap zone depend on the number of particles in the local support domain. Thus the partition results of RKPM meshless models will generate much wider range boundary zones than FEM models. Comparing the partition results of FEM and RKPM models, the boundary of each subdomain is dissimilitude. Fig. 1 shows that the boundary between two partitions of FE model is a line. The integration points of each element are contained the inside of each element. Fig. 2 shows that the boundary between two partitions of RKPM model is widely overlap zone. It is clearly show that the number of particles in the overlap zone of RKPM model is much larger than the number of particles in the boundary belt of FEM model. 4.2. The data communication scheme Because quantity of particles in boundary zones of the RKPM model is much larger than the particles of the

91

Fig. 1. A boundary line of a FEM model.

Fig. 2. A boundary zone of a RKPM model.

FEM model, the time consumed in communication during the computation of RKPM is much longer than FEM. Additional, the EBCs cannot be enforced directly as mentioned in Section 2. For problems with a large number of EBCs, however, the number of support particles can significantly increase for integration points aﬀected by the modiﬁcations in Eqs. (8) and (9). The number of shared particles may drastically increase. Therefore, communication may greatly increase thereby reducing the number of eﬀective processors in some cases. Further eﬀort regarding the accommodation of EBCs in parallel is still needed. For these reasons, proposed parallel 3D RKPM system requires constructing structures for eﬃcient data communication. For purpose of transferring the information in high eﬃciency, the system need to store following information before exchanging data. Firstly, in order to estimate time of communication, the number of particles and integration points in each overlap boundary zone among subdomains

92

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

CPU identify number

CPU identify number

Number of particles in boundary zone

Overlap zone identify number

Subdomain map table

Local information table … of subdomains

Overlap boundary identify number

Local identify number in each subdomain

Fig. 3. Data structures for transferring data in overlap zones.

Searching the overlap particles which locate in subdomain N

Searching the overlap zones which contain current overlap particles in subdomain N

Searching the subdomains which are adjacent with subdomain N

Positioning the overlap particles in adjacent subdomains

Transferring internal force of overlap particles to adjacent subdomains Fig. 4. Flow-chart of internal force transferring procedure.

must be summed. For positioning the particles needed exchanging data, the particle’s location in each subdomain requires ﬁxing and storing. Finally, we build the location mapping table to store relations of particles in each boundary zone as shown in Fig. 3 and the procedure of exchanging data of overlap zone is shown in Fig. 4. 4.3. The parallel algorithms for contact searching process To simulate the contact between a deformable body and several rigid bodies in 3D parallel RKPM, a direct approach based on the segments, particle to segment contact algorithm, is used for 3D parallel system. The deformable body is discretized by a set of particles. The boundary particles of the deformable body are referred as slave particles in this work. The surfaces of the rigid bodies (master) are modeled by quadrilateral (or triangular) segments. The particles to segment contact algorithm is based on three major procedures: the test pairs’ creation, the contact

detection and the contact forces evaluation. More details about contact algorithm are discussed in Refs. [29,30]. In common sense, master segments and slave segments of meshless models should be partitioned in synchronism, but such a strategy cannot be implemented due to following reasons: 1. Complexity of meshless parallel contact algorithms. As we know, each subdomain of meshless model has boundary domains composed of quite a few shared particles. When the master segments group and the salve particles group are partitioned in the same domains, the segments belonging to this domain should be searched by particles in limited local zone. Thus, the segments of others should be ignored. The contact search under such statement probably losses contact particles in practice. In order to solve above problems, contact searching procedure need exchanging information among all subdomains. This procedure requires complex data structure to guarantee accurate results. 2. Signiﬁcant computational eﬀort. Due to the large deformation problems, the position of each particle should be changed a lot. The initial partition results cannot ﬁt for the contact search requirements. Therefore, it requires partitioning in each time step. If the master segments and salve particles are both partitioned, the information of master segments and slave particles need updating and rearranging. So partition process will employ signiﬁcant computational eﬀort. For above reasons, we suggest the dynamic partition strategy for parallel contact problems. Only slave particles should be partitioned in each step during contract procedures. Although the search range should be expanded and more CPU time should be increased, dynamic partition strategy can ﬁnd contact pairs completely and balance the communication time of each PE. 4.4. The parallel computation of RKPM explicit dynamic analysis implemented by MPI Explicit MPI statements make all communication of parallel 3D RKPM system. In order to transfer data in

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

safety status and avoid possible deadlock and overhead associated with buﬀering, the non-blocking method is applied. Due to the explicit algorithms, the computations in each particle are independent. The developed system only needs to transfer the internal force of particles limited in overlapped boundary zones. Due to partition scheme proposed in Section 4.1, integration points are partitioned ﬁrstly; subdomains of integration points are all in nonoverlap fashion. We only require summing all internal force of each overlap boundary zone in correct position to obtain the true value. The explicit parallel RKPM algorithm of whole process is described in Fig. 5.

Input data

Partitioning integration points

93

5. Numerical examples In this section, Taylor bar impact problem is benchmark test for demonstrating the validity and eﬃciency of the parallel 3D RKPM algorithm for large deformation. Back extrusion problem as practical example is used to test the eﬃciency of proposed system for bulk forming problems. Both of commercial software and the proposed parallel 3D RKPM system run on the IBM p690 with the AIX5.2 OS in this work. In all computations, the material model is elastic–plastic with Von Mises ﬂow theory and linear isotropic hardening. Taylor bar impact problem is solved by LSDYNA3D; Back extrusion problem is solved by ANSYSY8.0, and the type 8-nodes hex solid element is used for analysis. In this paper, we apply speedup and eﬃciency to estimate the performance of parallel 3D RKPM system. For a given algorithm or computation, let Tp be the time to perform the computation using p processors or arithmetic units. Note that this makes T1 the depth of the algorithm. We deﬁne the speedup Sp [35–37] with p processors as Sp ¼

Partitioning correspond particles of integration points

T1 Tp

ð23Þ

and the eﬃciency Ep with p processors as Initialing MPI environment variables

Calculating particles’ displacement

Calculating the length of step time and broadcasting to every subdomains

Calculating the external force, internal force and contact force of every subdomains

Exchanging the internal force of overlap zones

Calculating the accelerations of every subdomains

Ep ¼

Sp : p

ð24Þ

Because p processors can do no more than p times as much work per unit time as one processor, the eﬃciency is theoretically bounded above by one. 5.1. Taylor bar benchmark test We need a benchmark test in order to verify the accuracy of 3D RKPM system and compare the eﬃciency of the various proposed parallelization methods presented further. Taylor bar impact problem which involves large deformation and moderate high strain rate is selected as benchmark. In the standard Taylor bar impact, a cylindrical bar impacts a rigid surface, as shown in Fig. 6a. Because of the symmetry of the problem only a quarter of the bar is discretized as shown in Fig. 6b. In this paper,

Calculating the displacement of every subdomains

Gathering data from all subdomains and scattering the updated data for every subdomains

Outputting data

End Fig. 5. The ﬂow chart of RKPM parallel computation process.

Fig. 6. The quarter of Taylor bar: (a) continuum domain and (b) discretized model.

94

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

Table 1 Material parameter list for Taylor bar impact Young’s modulus Poisson’s ratio Initial yield stress Hardening modulus Mass density

E t r H q

117 GPa 0.35 400 MPa 100 MPa 8930 kg/m3

Table 2 Comparison of computation eﬃciency of diﬀerent PE for Taylor bar problem

8.00 7.00 6.00 Radius (mm)

we will focus on the speedup obtained after the parallelization of the code using this benchmark test. The initial dimensions of the rod are r0Z = 3.2 mm and l0Z = 32.4 mm. The impact is assumed frictionless and the impact velocity is set to ViZ = 227 m/s. The ﬁnal conﬁguration is obtained after 80 ls. The constitutive law is elasto-plastic with a linear isotropic hardening, material properties, given in Ref.

5.00 4.00 3.00 Parallel 3D RKPM system FEM LSDYNA3D code

2.00 1.00

Number of processors

Run time (s)

Sp

Ep (%)

1 2 4 8 16 32

5323 2812 1656 830 512 220

1.00 1.81 3.21 6.41 10.39 24.14

100.00 90.5 80.35 80.16 64.64 75.5

120

0.00 0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

Time (µs)

100 35.00

Efficiency (%)

30.00

Height (mm)

25.00 20.00

80

60

Ideal 40

15.00 Parallel 3D RKPM system FEM LSDYNA3D code

10.00 5.00

Parallel 3D RKPM system

20

0

0.00 0.00

10.00

20.00

30.00

40.00 50.00 Time (µs)

60.00

70.00

1

80.00

2

4

8

16

32

Number of Processors

Fig. 7. Taylor bar impact: time histories of the variation of the dimensions of the bar. (a) Radius and (b) height.

35 Ideal Parallel 3D RKPM system

30

Speedup

25 20 15 10 5 0

Fig. 8. Deformed conﬁgurations for Taylor impact at time 80 ls. Distribution for 16 CPUs, diﬀerent deep color means diﬀerent processes. (a) RKPM, (b) FEM and (c) Gaussian integration cells.

1

2

4

8

16

32

Number of Processors Fig. 9. Parallel performances on IBM690 of RKPM for Taylor bar problem: (a) speedup comparison and (b) eﬃciency comparison.

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

[38], corresponding to an OHFC copper are reported in Table 1. For verifying the accuracy of parallel 3D RKPM system, the commercial code LS-DYNA3D is also implemented for solving same problems. Models of meshless consist of 1369 particles for RKPM with stress point integration and 1200 elements for LSDYNA3D.

95

Fig. 7a shows the time history of the radius at the impact plane. It can be seen that the results by RKPM meshless code and LSDYNA3D FEM program are nearly the same. The results by FEM are also given. At t = 80 ls, the radius at the plane of impact is 7.30 mm with RKPM. The result by FEM is 7.38 mm. Many results for this radius have been collected in Ponthot [40] and the mean value is 7.11 mm.

Fig. 10. Schematic model for back extrusion analysis: (a) continuum domain and (b) discretized model.

Fig. 11. Deformed conﬁgurations for back extrusion process.

96

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

Fig. 7b gives the time history of the height of the bar. The results by RKPM and FEM are also nearly identical. At time t = 80 ls the height of the bar is 21.48 mm by 3D RKPM system and 21.46 mm by FEM. Fig. 8a and b shows the deformed conﬁguration at t = 80 ls by RKPM and FEM. Fig. 8c demonstrates the integration cells for each process during parallel computation, it demonstrates the proposed system has good load balancing for each process during computation. Similar deformation patterns are observed. The diﬀerences between the solutions are reasonable and this benchmark test is retained. Additional, the performance parameter Ep and Sp for diﬀerent PE number also presented in Table 2 and comparison between practical speedup and eﬃciency of parallel RKPM meshless program and ideal value are shown in Fig. 9.

5.2. Back extrusion analysis The simulation of the back extrusion process is considered. Fig. 10 shows the statement of the problem. The die is ﬁxed while the punch is moved downward with a constant velocity of 10 m/s. The coeﬃcient of friction is 0.12. The simulation time is 0.01 s. Because of the symmetry of the problem only half of the workpiece has been discretized with 5621 (73 · 7 · 11) particles and 16,000 Gaussian integration points are applied for integration. The material is the same as in Table 1. Fig. 11 shows the deformed conﬁguration at punch stroke of 0.05 m, 0.15 m. The deformed patterns are nearly the same by RKPM and FEM. Fig. 12 demonstrates a good eﬀective strain and stress distribution agreement by RKPM and FEM. However, as simulation process contin-

Fig. 12. Computed results for the compression of back extrusion at punch stroke of 0.05 m deformed models.

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101 35 Ideal Developed RKPM system

30 25

Speedup

ues at punch stroke of 0.25 m, the mesh distortion is more severe and diﬃculties occur in FEM computation as shown in bottom of Fig. 11. It can be seen that some elements in FEM solution are overlap at the punch stroke and will come to a halt. It is proved that the proposed 3D RKPM system can easy to solve element distortion problems. In order to demonstrate the eﬃciency of parallel meshless program, The 1, 2, 4, 8, 16, 32 CPUs are implemented for back extrusion problem. The performance parameters speedup Sp, eﬃciency Ef and computation time are listed in the following Table 3. The distribution of Gaussian integration cells of RKPM in each process during computation is shown in Fig. 13. Comparison among practical speedup and eﬃciency of parallel 3D RKPM system, commercial software ANSYS and ideal value are shown in Fig. 14.

97

ANSYS

20 15 10 5 0 1

2

5.3. Wheel forging analysis

4

8

16

32

16

32

Number of Processors 120

100

80 Efficiency (%)

This example aims to simulate the cold forging process of a wheel. A cylindrical slug as shown in Fig. 15a is used to obtain a wheel as shown in Fig. 15b, meshless model is presented in Fig. 15c. The following materials parameters are used for the simulation. Because of the symmetry of the problem an axisymmetric formulation is used and only half of the domain is discretized with 9610 particles. The punch and die, assumed to be rigid and frictionless, are discretized by piecewise linear segments to ﬁt the geometry of the wheel. The punch is moving downward with constant velocity of 10 m/s while the die is ﬁxed. The maximum stroke of the punch at the end of the simulation is 0.02 m. The conﬁguration at punch stroke of 0.01 m, 0.02 m and distribution of stress and strain at 0.01 m are shown in Figs. 16 and 17. It is pre-

60

40 Ideal Developed RKPM system ANSYS

20

0

Table 3 Performance parameters for various implementations of drawback extrusion problem by parallel 3D RKPM system Number of processors

Run time (s)

Sp

Ep (%)

1 2 4 8 16 32

69783.17 35479.84 20545.67 12674.82 7765.21 3116.71

1.00 1.96 3.39 5.53 8.98 22.39

100.00 98.00 84.75 69.13 56.13 69.96

1

2

4 8 Number of Processors

Fig. 14. Performance diagram for drawback extrusion: (a) speedup comparison and (b) eﬃciency comparison.

sented that proposed parallel 3D RKPM system and ANSYS8.0 have almost same results. The 1, 2, 4, 8, 16, 32 CPU are also implemented for forging problem to demonstrate eﬃciency of parallel 3D RKPM system. The performance parameters speedup Sp, eﬃciency Ef and computation time are listed in the following Table 4. The distribution of Gaussian integration cells in each process is presented in Fig. 18. Comparison between practical speedup and eﬃciency of parallel 3D RKPM system and ideal value are shown in Fig. 19. 5.4. Discussions

Fig. 13. Gaussian integration cells distributed in 16 PE at punch stroke of 0.25 m. Eﬀective strain distribution (TOP) and eﬀective stress distribution predicted by RKPM and FEM.

The capabilities of proposed parallel 3D RKPM method in simulating 3D bulk forming problems with bulk forming problems have been studied. Figs. 11, 12 and 15, 16 show that both proposed parallel 3D RKPM system and the

98

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

Fig. 15. Schematic model for back extrusion analysis: (a) raw slug, (b) geometry of the wheel and (c) discretized model.

Fig. 16. Deformed conﬁgurations for wheel forging process.

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

99

Fig. 17. Computed results for the compression of wheel forging problems at ﬁnial punch stroke deformed models: eﬀective strain distribution (TOP) and eﬀective stress distribution predicted by RKPM and FEM.

Table 4 Material parameter list for wheel forging process Young’s modulus Poisson’s ratio Mass density Stress–strain curve

E t q r = 576.79(0.01658 + ep)0.3593 MPa

71 GPa 0.3 2700 kg/m3

FEM software LSDYNA3D, ANSYS8.0 have a good convergence rate in conﬁguration and distribution of strain and stress, but the RKPM can solve the distortion of meshes easily. From Tables 2, 3 and 5, Figs. 9, 14 and 19 of above numerical examples, we can see almost ideal speedup is achieved for both the ﬁxed and scaled models even up to 32 processors, and this trend is anticipated to continue over 32 processors. Figs. 8c, 13 and 18 present good balancing rates during simulation process, each process has almost same amount of Gaussian integration cells. It is the key

Fig. 18. Gaussian integration cells distributed in 16 processes in the ﬁnial step of wheel forging process.

reason why proposed parallel 3D RKPM system has such good parallel computation eﬃciency based on MPI. Additional, although commercial FEM software (e.g. ANSYS, DEFORM) have better computation eﬃciency than developed 3D RKPM system, current system has better speedup

100

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101

processors by MRSB method and shared particles deﬁnitions are overlap, so that all support domains for all particles are deﬁned locally on the corresponding processor. Explicit MPI message passing statements are used for all communication among subdomains on distributed processors. Taylor bar impact problem is presented for proving validity of parallel 3D RKPM. Both the simulation of Taylor bar and back extrusion analysis demonstrate the eﬃciency of the current parallel RKPM for bulk forming problems. However, it should be noted that the communication execution time of meshless is much than FEM due to large overlap boundary zones.

35.00 30.00

Ideal Developed RKPM system ANSYS

Speedup

25.00 20.00 15.00 10.00 5.00 0.00 1

2

4 8 Number of Processors

16

32

120. 00

Efficiency (%)

100. 00 80. 00 60. 00 Ideal 40. 00

Developed RKPM system FEM

20. 00 0.00

1

2

4 8 Number of Processors

16

32

Fig. 19. Performance diagram for wheel forging process: (a) speedup comparison and (b) eﬃciency comparison.

Table 5 Performance parameters for various implementations of wheel forging problem Number of processors

Spend time (s)

Sp

Ef

1 2 4 8 16 32

92752.12 48308.39 27041.43 17018.74 9293.80 4193.13

1.00 1.98 3.43 5.45 9.98 22.12

100 95 85.75 68.13 62.38 69.12

and parallel eﬃciency than ANSYS. The parallel 3D RKPM system is a useful system for bulk forming problems in consideration of above factors. 6. Conclusions A parallel computational implementation of 3D RKPM methods is presented for the explicit dynamic analysis. A DDM parallel implementation is used with model partitioning for a Lagrangian formulation. With proposed strategy, integration cells are uniquely partitioned on separate

References [1] Belytschko T, Krongauz Y, Organ D, Fleming M, Krysl P. Meshless methods: An overview and recent developments’ structures. Comput Methods Appl Mech Eng 1996;139:3–47, Special issue on Meshless Methods. [2] Liu Wing Kam, Jun Sukky, Zhang Yi Fei. Reproducing kernel particle methods. Int J Numer Methods Fluids 1995;20:1081–106. [3] Liu WK, Jun S, Li S, Adee J, Belytschko T. Reproducing kernel particle methods for structural dynamics. Int J Numer Methods Eng 1998;38:1655–79. [4] Jun S, Liu WK, Belytschko T. Explicit reproducing kernel particle methods for large deformation problems. Int J Numer Methods Eng 1998;41:137–66. [5] Liu WK, Chen Y. Wavelet and multiple-scale reproducing kernel methods. Int J Numer Methods Fluids 1995;21:901–31. [6] Chen JS, Roque C, Pan C, Button ST. Analysis of metal forming process based on meshless method. J Mater Proc Technol 1998;80: 642. [7] Chen JS, Pan C, Wu CT, Roque C. A Lagrangian reproducing kernel particle method for metal forming analysis. Comput Mech 1998;19:153. [8] Kulasegaram S. Development of particle based meshless method with applications in metal forming simulations, Ph.D. Thesis, University of Wales, Swansea, 1999. [9] Bonet J, Kulasegaram S. Correction and stabilization of smooth particle hydrodynamics methods with applications in metal forming simulations. Int J Numer Methods Eng 2000;47:1189. [10] Li GY, Belytschko T. Element-free Galerkin method for contact problems in metal forming analysis. Eng Comput 2001;18:62. [11] Lou LL, Zeng P, Fang G. Meshless method and its application to bulk metal forming. J Plast Eng 2001;3:1 [in Chinese]. [12] Guo YM, Nakanishi K. A rigid-plastic meshless method for forming simulations. Eng Plast Macroscale Nanoscale PTS 1 and 2, Key Eng Mater 2003;233(2):341. [13] Yvonnet J, Chinesta F. The natural element method in metal forming simulations involving large strains. In: LUXFEM03 international conference, Luxembourg, 2003. [14] Chen JS, Wang HP, Liu WK. Meshfree method with enhanced boundary condition treatment for metal forming simulation. In: The 1999 NSF design and manufacturing grantees conference, Queen Mary, USA, 1999. [15] Chen JS, Wang HP. New boundary condition treatments in meshless computation of contact problems. Comput Method Appl Mech Eng 2000;187:441. [16] Liu GR. Mesh-free methods, moving beyond the ﬁnite element method. CRC Press; 2002. [17] Belytschko T, Lu YY, Gu L. Element free Galerkin methods. Int J Numer Methods Eng 1994;37:229–56. [18] Kent T Danielson, Hao Su, Liu Wing Kam, Aziz Uras R, Li Shoafan. Parallel computation of meshless methods for explicit dynamic analysis. Int J Numer Methods Eng 2000;47:1323–41.

H. Wang et al. / Advances in Engineering Software 38 (2007) 87–101 [19] Simon H. Partitioning unstructured problems for parallel processing. Comput Syst Eng 1991;2(2–3):135–48. [20] Lanczos C. An iteration method for the solution of the eigenvalue problem of linear diﬀerential and integral operators. J Res Nat Bur Stand 1950(45):255–82. [21] Barnard ST, Simon HD. Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurr: Practice Experience 1994;6(2):101–17. [22] Paige CC, Saunders MA. Solution of sparse indeﬁnite systems of linear equations. SINUM 1975;12:617–29. [23] Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs, Technical Report TR 95-035, Department of Computer Science, University of Minnesota, 1995. [24] Walshaw C, Cross M, Johnson S, Everett M. A parallelisable algorithm for partitioning unstructured meshes. In: Ferreira A, Rolim J, editors. Proceeding of irregular’94: parallel algorithms for irregular problems: state of the art. Dordrecht: Kluwer Academic Publishers; 1995. p. 23–44. [25] Henderson B, Leland R. A multilevel algorithm for partitioning graphs. In: Proceeding of supercomputing’95. ACM; 1995, December. [26] Kernighan B, Lin S. An eﬃcient heuristic procedure for partitioning graphs. Bell Syst Tech 1970;49(2):291–307. [27] Lucy LB. Astron Astrophys 1977;82:1013. [28] Gingold RA, Monaghan JJ. Mon Not R Astron Soc 1977;181:375. [29] Zhong Zhihua. Finite element procedures for contact-impact. Oxford University Press; 1993. [30] Belytschko T, Moran B, Liu WK. Nonlinear ﬁnite element continua and structures. Wiley, John & Sons, Incorporated; 2000. [31] Randles PW, Libersky LD. Smoothed particle hydrodynamics: some recent improvements and applications. Comput Methods Appl Mech Eng 1996;139:375–408. [32] Cambell J, Vignjevic R, Libersky L. A contact algorithm for smoothed particle hydrodynamics. Comput Methods Appl Mech Eng 2000;184:49–65. [33] Hoover CG, DeGroot AJ, Maltby JD, Procassini RJ. ParaDynDYNA3D for massively parallel computers. Engineering, Research, Development and Technology FY94, CR, 53868-94, Lawrence Livermore National Laboratory, Livermore, CA, 1995.

101

[34] Plimpton S, Attaway S, Hendrickson B, Swegle J, Vaughan C, Gardner D. Transient dynamics simulations: parallel algorithms for contact detection and smoothed particle hydrodynamics. In: Proceedings of superComputing96, Pittsburgh (PA), 1996. [35] Danielson KT, Namburu RR. Nonlinear dynamic nite element analysis on parallel computers using FORTRAN 90 and MPI. Adv Eng Softw 1998;29(3-6):179–86. [36] Williams RD. Performance of dynamic load balancing algorithms for unstructured mesh calculations. Concurr: Practice Experience 1994;3:457–81. [37] Hsieh SH. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies, Ph.D. Dissertation, Cornell University, Ithaca, New York, 1993. [38] Vignjevic R, Campbell J. A penalty approach for contact in smoothed particle hydrodynamics. Int J Impact Eng 1999;23:945–56. [39] Liu WK, Chang H, Chen JS, Belytschko T. Arbitrary Lagrangian– Eulerian Petrov–Galerkin ﬁnite elements for nonlinear continua. Comput Methods Appl Mech Eng 1988;68:259–310. [40] Ponthot JP. Mechanics of continuous solids under large strains, and uniﬁed treatment by the ﬁnite element method, Technical Report, TF38, University of Liege, Liege, 1995. [41] Wang Hu, Li Guang-Yao, Zhong Zhi-Hua. Modiﬁcation of automatic mesh grid partition in parallel ﬁnite element computation. Gongcheng Lixue/Eng Mech 2005;22:46–51. [42] Anderheggen E, Renau-Munoz JF. A parallel explicit solver for simulating impact penetration of a three-dimensional body into a solid substrate. Adv Eng Softw 2000;31:901–11. [43] Brown K, Attaway S, Plimpton S, Hendrickson B. Parallel strategies for crash and impact simulations. Comput Methods Appl Mech Eng 2000;184:375–90. [44] Jia R, Sunde’n B. Parallelization of a multi-blocked cfd code via three strategies for ﬂuid ﬂow and heat transfer analysis. Comput Fluids 2004;33:57–80. [45] Rama Mohan Rao A, Appa Rao TVSR, Dattaguru B. A new parallel overlapped domain decomposition method for nonlinear dynamic ﬁnite element analysis. Comput Struct 2003;81:2441–54. [46] Wang Hu, Li Guangyao, Zhong Zhihua. Optimization on automatic mesh partition method in parallel ﬁnite element computation. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/J Comput Aided Des Comput Graph 2005;17(8):1766–72.

Development of parallel 3D RKPM meshless bulk forming simulation system

Development of parallel 3D RKPM meshless bulk forming simulation system

Recommend Documents