GeoFEM: high performance parallel FEM for solid earth

GeoFEM: high performance parallel FEM for solid earth

Future Generation Computer Systems 18 (2001) 107–114 GeoFEM: high performance parallel FEM for solid earth K. Garatani a,∗ , H. Nakamura a , H. Okuda...

230KB Sizes 0 Downloads 14 Views

Future Generation Computer Systems 18 (2001) 107–114

GeoFEM: high performance parallel FEM for solid earth K. Garatani a,∗ , H. Nakamura a , H. Okuda b , G. Yagawa c a

c

Department of Research for Computational Earth Science, Research Organization for Information Science and Technology (RIST), 1-18-16 Hamamatsucho Minato-ku, Tokyo 105-0013, Japan b Department of Mechanical Engineering and Materials Science, Yokohama National University, 79-5 Tokiwadai Hodogaya-ku, Yokohama 240-0067, Japan Department of Quantum Engineering and Systems Science, The University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo 113-8656, Japan

Abstract The Science and Technology Agency of Japan began an Earth Simulator project in the 1997 fiscal year, that will be able to forecast various earth phenomena through simulation. GeoFEM is a parallel finite element software for solving problems involving the solid earth. In this paper, we briefly describe the GeoFEM project and discuss GeoFEM’s capability for large-scale analysis and the results of simple benchmark analyses. At this stage, the largest linear elastic problem that has been solved by GeoFEM involves more than 108 degrees of freedom (DOF). In the 2001 fiscal year, we will attempt to solve a 1010 DOF problem using the Earth Simulator. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Parallel; Finite element method; Large-scale computing; Geophysics

1. Introduction In order to solve global environmental problems and prepare for managing natural disasters, the Science and Technology Agency is attempting to predict global change through a combination of global observation, global change process research and computer simulation. In the 1997 fiscal year, the agency began a five-year project to develop an Earth Simulator that would enable various earth phenomena to be forecast through simulation using a virtual earth placed in a supercomputer. It is considered to be one of the top priority projects among the national scientific agenda in Japan. The specific research topics of the project include four major themes: development of an Earth Simulator, modeling and simulation of atmospheric or oceanic fields, modeling and simulation of the solid earth field ∗

Corresponding author.

and development of large-scale parallel software. GeoFEM, 1 is the name chosen for the project and the software system involving the modeling and simulation of solid earth and the development of a large-scale parallel software system. The GeoFEM system [1,2] is a parallel finite element analysis system intended for application to multi-physics/multi-scale problems and is being developed at the Research Organization for Information Science and Technology (RIST). Since a number of models have not yet been completely established in the solid earth field, the simulation must be carried out on a trial and error basis. Therefore, a joint venture with a geophysical modeling research group is crucial for the development of a targeted simulation software system. The GeoFEM project is expected to be a breakthrough in bridging the geoscience and information science fields.

1

http://geofem.tokyo.rist.or.jp.

0167-739X/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 7 3 9 X ( 0 0 ) 0 0 0 8 0 - 7

108

K. Garatani et al. / Future Generation Computer Systems 18 (2001) 107–114

2. Overview of the GeoFEM project 2.1. Design of GeoFEM The GeoFEM group is planning to develop the software system in the following two phases: 1. GeoFEM/Tiger (1997–1998). This phase involves the development of multi-purpose parallel finite element software which may be applied to various scientific fields and which will become the basis for the Earth Simulator program, to be developed in the next phase. A platform from which various solid earth models may be read in a plug-in style will be developed. 2. GeoFEM/Snake (1999–2001). A software system specifically for the simulation of solid earth phenomena that uses GeoFEM/Tiger will be developed in this phase. Interactions between the different scales will be considered, and the Earth Simulator will be optimized.

are shown in Fig. 1. The dynamics of the whole earth (global) involving mantle convection solved as heat–matter transfer phenomena of a viscous fluid will be simulated. On the Japanese islands scale (regional), the motion of the dislocation, the buildup of tectonic stresses, and the deformation of plates will be simulated by modeling these phenomena as static elastic or viscoelastic. The simulation on the regional scale (local) will involve the solution of the seismic wave propagation by dynamic structural analysis. These interface areas including the couplings, will be implemented at the GeoFEM/Snake stage. 2.3. Development environment for GeoFEM/Tiger

1. Analysis scale. Linear problems of up to 108 degrees of freedom (DOF) are to be solved within a practically reasonable length of time using currently available parallel computers. 2. Physical phenomena. The physical phenomena of the solid earth to be solved by the GeoFEM system

2.3.1. Hardware environment The Earth Simulator will be composed of 640 nodes connected by a cross-bar network, in which each node has eight vectorized processors that are conjuncted by a shared memory. As a system, it is believed that over 40 TFLOPS of peak processing speed will be achieved. The system is therefore sometimes referred to as the GS40 (Gaea Simulator 40 TFLOPS). It is also believed that approximately 10 TB of memory capacity will be supplied. However, until this hardware becomes available, development of the software system will be carried out on existing hardware platforms. When GS40 does become available, the software will be improved in order to obtain a higher level of optimization.

Fig. 1. Space/time scale of solid earth to be solved by GeoFEM.

2.3.2. Software environment Script language for the simulation software. For the main calculations, FORTRAN 90 will be used so that floating point operation speed can be maintained. However, this may not be the case for system utilities or visualization, which will be coded by C or C++. Data parallelization will be implemented by using MPI as the message passing library. Device independent interface between the subsystems. By providing the specifications for the device-independent data transfer method, subsystems will be isolated from one another, which will raise the level of independence at the time of development. Therefore, utility routines will be provided. These routines will allow the system to change easily between input or output device files, message transfer or memory.

2.2. Objectives of GeoFEM/Tiger

K. Garatani et al. / Future Generation Computer Systems 18 (2001) 107–114

Fig. 2. System configuration of GeoFEM/Tiger.

2.3.3. Organization The members who are developing the GeoFEM system are positioned at RIST which consists of researchers at universities or institutes and software engineers from private companies. Their status is endorsed by RIST and they apply themselves to research and development of the GeoFEM project. 2.4. System composition of GeoFEM/Tiger The GeoFEM/Tiger system will be composed of the subsystems and integrated environment as illustrated in Fig. 2. Each subsystem is operated on the assumption that the subsystems are linked by device-independent interfaces. Copying of data will be avoided because of the large-scale of the data and pursuit of high-speed computation. 1. Partitioning subsystem. Graph partitioning is carried out based on the data from the large-scale complex CAD and mesh generator. The partitioning data, including the neighboring information, will be transferred to the succeeding subsystems. Nodal partitioning with overlapped subdomains will be taken into consideration. Generally, partitions with a larger number of edge-cuts lead to bad convergence. In GeoFEM, the Greedy or recursive co-ordinate bisection method (RCB) is supplied as a graphical partitioner. METIS is also available as a partitioning tool for complicated geometry.

109

2. Structural analysis subsystem. Static or assumedtime-dependent algorithms will be used in linear elastic [3], viscoelastic, and elastoplastic cases in order to be able to handle various analysis conditions. Contact analysis [4] will be supported for modeling the dislocations. Since the stiffness matrix is created locally and then transferred to the solver subsystem in the form of a subroutine call, message passing is not apparent within the subsystem. 3. Thermal flow analysis subsystem. This subsystem will include a function incompressible viscous flow analysis. Mantle convection is the movement of various kinds of material as a result of higher temperature in the inner part of the earth. Equations that govern the process may be expressed using time-, space- and temperature-dependent rheology or viscoelastic models. The present stage of the flow modeling of GeoFEM involves coupled heat and viscosity analysis using a parallelized FEM code. This subsystem also creates the matrix locally and transfers it to the solver, thus parallel procedure is not required in this subsystem. 4. Solver subsystem. In large-scale scientific computing, the linear sparse solver is one of the most time-consuming processes. In GeoFEM, various types of preconditioned iterative methods (CG, BiCGSTAB, GPBiCG, GMRES; [5]) are implemented on massively parallel computers. The ILU(0) factorization is a very effective preconditioning method for an iterative solver. However, this method is globally dependent data-wise, and so it is not the optimal methodology on parallel computers where locality is of utmost importance. In GeoFEM, the localized ILU(0) preconditioning method [6] has been implemented on various types of iterative solvers [7,8]. This method provides data locality on each processor and a good parallelization effect. 5. Visualization subsystem. Parallel image processing for large-scale unstructured mesh data, including data reduction and extraction will be carried out in this subsystem [9]. In this version, image processing calculation will be performed in a different phase from the analysis. Subsystems (1)–(5) will be controlled using shells. An object-oriented system configuration will be

110

K. Garatani et al. / Future Generation Computer Systems 18 (2001) 107–114

Fig. 3. System configuration of GeoFEM/Tiger.

made possible by integrating the subsystems in a device-independent fashion [10]. Furthermore, different types of data such as geographical, CAD, FEM and visualization data will be managed using a data structure model to be developed in the future. 2.5. Features of completed GeoFEM/Tiger At the end of the 1988 fiscal year, development of GeoFEM/Tiger was completed. The dominant features of GeoFEM/Tiger are described below: 1. Pluggable design. The architecture of GeoFEM/Tiger is shown in Fig. 3. The system is composed of a utility, an analyzer, a visualizer and a solver. GeoFEM/Tiger is designed to be a parallel FEM platform; therefore, arbitrary analysis code can be plugged into the system as an analyzer by interfacing the data structure and calling the sequence. This feature is supported by a localized data structure and system utility routines for input or output. 2. High performance. The structural subsystem of GeoFEM/Tiger solved problems involving more than 108 DOF using a 1000-processor element computer with high parallel efficiency. This result is described in the next section. 3. Open to public. In this project, all results including source code, example data, concerned documents are presented on the Internet.

3. Large-scale analysis feature of GeoFEM In this section, the large-scale analysis feature of the GeoFEM structure sub-system is discussed. 3.1. Analysis methods for parallel FEM For parallel FEM, the domain decomposition method (DDM) is generally used [11]. In this method, we decompose the entire model into a number of subdomains in order to assign each processing element (PE). At GeoFEM system, the number of decomposed subdomains and PE count remain the same in order to maintain large granularity. When solving DOF, two types of DDM algorithm are used: one for solving all DOF and the other for solving boundary DOF. When we adopt the former, an iterative solver such as the CG method is usually used. The latter involves two methods: composing the reduced matrix explicitly (substructure method, SSM) or converging the boundary force using iterative solver [12,13]. There are several kinds of DDM algorithms, each having its own advantages and disadvantages [14,15]. The SSM is not suitable for the large-scale analyses. GeoFEM adopts the DDM, which solves all DOF by an iterative method, because of the stability and applicability to nonlinear problems. In this method, proper partitioning and pre-conditioning are key technologies for stable and fast calculation. For the partitioning, the

K. Garatani et al. / Future Generation Computer Systems 18 (2001) 107–114

111

Fig. 4. Simplified FEM model.

Greedy or RCB method was adopted, and localized ICCG was selected for the solver in this stage. 3.2. Resource estimation for large-scale analysis In this section, we estimate the computer resources needed for 108 DOF linear elastic problem using the method described in the previous section. For FEM mesh, we use a simple unit cubic model, as shown in Fig. 4. The model has sides of unit length and can easily change the number of DOF in order to vary the division counts of each side. The memory capacity and the elapsed time problems are dominant when dealing with large-scale FEM analyses, and most of the memory is used by solver. Therefore, we estimate the memory capacity and elapsed time required by solver. 3.2.1. Memory capacity At the structural analysis, memory estimation for using the iterative solver is predicted by the total amount of memory required for the entire stiffness matrix. The term count of the matrix is calculated by the connected node count at each DOF. For example, the term count at a particular DOF is expressed by DOF per node × connected node count. When unknowns setup to displacement (ux , uy and uz at each node), then the DOF per node becomes three. For the model shown in Fig. 4, the connected node count per node is 27. Each term requires a double-precision real and one integer-type variable, which comes to 12 bytes per term. Therefore, the memory capacity needed to express the entire stiffness matrix becomes 972 DOF bytes. Taking other variables into account, the rough estimation is expressed as follows: Required memory at solver=1000 DOF (bytes).

(1)

This relationship is shown in Fig. 5. In this figure, the total capacity of the memory becomes PE count ×

Fig. 5. Required memory estimation.

memory per PE in the case of the DDM. When using the SR2201 at the Tokyo University, which has 1024 PE and 256 MB memory per PE, we can compute 108 DOF analysis. Using the Earth Simulator (GS40), which has 10 TB of memory, we will attempt to solve a 1010 DOF problem. 3.2.2. Elapsed time For elapsed time estimation, we measured a wall clock for various DOF using a single PE of the SR2201. The method or procedure of calculation is presented in Fig. 4, and the method used for the solver is ICCG. According to the parametric study, the iteration count expressed by the power of the DOF and the elapsed time per iteration is relative to DOF. The result is shown in the following equation: Required time at solver = 6.65 × 10−5 DOF1.321 (s).

(2)

One PE line of Fig. 6 is plotted in the equation above. The figure includes up to 10 000 PE lines, which are simply divided into one-PE value by PE count assuming 100% parallel efficiency. As the result, when the SR2201, having 1000 PE, is used, we can analyze a 100 million DOF problem in 50 min. Although this estimation is one of the lower bounds, if we can maintain an efficiency of at least dozens of percentage then the analysis will be completed within a reasonable time.

112

K. Garatani et al. / Future Generation Computer Systems 18 (2001) 107–114

Fig. 6. Time estimation by SR2201.

The floating operations count at the solver is easily calculated because of the simple shape of the cubic model in Fig. 4. Therefore, we roughly calculated the number of floating point operations from the source code. The result shows that, in this problem, the number of floating point operations per iteration is proportional to approximately 336 times the number of DOF. Since the iteration count was already estimated by the function of the DOF in order to obtain Eq. (2), we can estimate the total floating operations count. In order to obtain the elapsed time from estimated operations count, we assume several computational speeds expressed in FLOPS, which stands for floating point operation count per second. These relationships are expressed in Fig. 7. For the analysis of 108 DOF, the calculation speed is therefore 11.3 GFLOPS at the solver. Since the peak speed of one PE is 300 MFLOPS for the SR2201, this speed is approximately 4% of the peak. This value is very low, because the optimization work has not been done at the GeoFEM/Tiger stage. The GS40, which is expected to have an effective speed of several TFLOPS, can analyze 1010 DOF in a practical elapsed time, as shown in Fig. 7. In such a incredibly large-scaled problem, we have to consider the precision problem due to double-precision real type variables.

Fig. 7. Time estimation.

to about 108 DOF for a cubic model, as shown in Fig. 4. These analyses fixed the nodal division count at a cubic subdomain of a side: ns to be 11, 22, 33 and 37, for varying PE counts 1, 23 = 8, 33 = 27, 43 = 64, 53 = 125 and 63 = 216. Furthermore, for ns = 33, PE counts of 83 = 512 and 103 = 1000 were applied. For the analysis using 1000 PE, the number of DOF becomes 333 × 3 × 1000 = 107 811 000. Fig. 9 presents the scalability assuming the elapsed time of one PE is given by Eq. (2). Each line of Fig. 9 indicates the scalability on the condition that the granularity remains constant, because the analysis fixed the

3.3. Benchmark analysis Fig. 8 shows the elapsed time of the benchmark analyses on the SR2201 at the University of Tokyo up

Fig. 8. Elapsed time.

K. Garatani et al. / Future Generation Computer Systems 18 (2001) 107–114

113

parallel finite element software program that may become the standard in the parallel finite element analysis field by implementing new technologies, such as data structures, multi-level solver approach under parallel environments, memory management to maximize the MPP calculations, parallel image processing of unstructured mesh data and a pluggable subsystem configuration. References

Fig. 9. Scalability.

scale of the problem per PE or ns and increased the number of PEs. The speed-up approaches the ideal value in accordance with larger granularity and eventually saturates a certain line indicating approximately 60% efficiency. When we measure the rate of CPU usage in these analyses, it becomes more than 96% in the case of ns = 33. Because GeoFEM uses the localized ILU(0) precondition, the number of iterations required in order to obtain a certain accuracy tends to increase in parallel computation due to the loss of the precondition effect [7,8]. This effect is the origin of the difference between parallel efficiency and the rate of CPU usage. However, this increase in the numbers of iteration is not significant, GeoFEM can provide almost linear scalability and relatively good parallel performance.

4. Concluding remarks In the present paper, an overview of GeoFEM, a parallel finite element analysis system intended for the simulation of multi-physics/multi-scale problems involving the solid earth on an Earth Simulator and its capability for large-scale analysis were reported. The system is currently under development or refinement and has already achieved 108 DOF analysis for a linear elastic problem. In the 2001 fiscal year, we will attempt to solve a 1010 DOF problem using the Earth Simulator. GeoFEM/Tiger is a various-objective

[1] G. Yagawa, H. Okuda, H. Nakamura, GeoFEM: multi-purpose parallel FEM system for solid earth, World Conference on Computational Mechanics IV, Vol. 2, 1988, pp. 1048–1053. [2] M. Iizuka, H. Nakamura, K. Garatani, K. Nakajima, H. Okuda, G. Yagawa, GeoFEM: high performance parallel FEM for geophysical applications, in: Proceedings of the Second International Symposium on High Performance Computing, Lecture Notes in Computer Science, Vol. 1615, Springer, Berlin, 1999, pp. 292–303. [3] K. Garatani, GeoFEM: multi-purpose parallel FEM for solid earth. 3. Large-scale structural analysis for solid earth, in: Proceedings of the Conference on Computational Engineering and Science, Vol. 3, Japan Society for Computational Engineering and Science, 1998. [4] M. Iizuka, GeoFEM: multi-purpose parallel FEM for solid earth. 4. Complex earth structure analysis, in: Proceedings of the Conference on Computational Engineering and Science, Vol. 3, Japan Society for Computational Engineering and Science, 1998. [5] J.J. Dongarra, et al., Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA, 1994. [6] J.J. Dongarra, et al., Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia, PA, 1990. [7] K. Nakajima, et al., Parallel iterative solvers with localized ILU preconditioning, in: Proceedings of the HPCN’97, Lecture Notes in Computer Science, Vol. 1225, Springer, Berlin, 1997, pp. 342–350. [8] K. Nakajima, H. Okuda, Parallel iterative solvers with localized ILU preconditioning for unstructured grids on workstation clusters, Int. J. Comput. Fluid Dyn. 12 (1999) 315–322. [9] Y. Takeshima, I. Fujishiro, GeoFEM: multi-purpose parallel FEM for solid earth. 7. Unstructured volume visualization approaches and issues, in: Proceedings of the Conference on Computational Engineering and Science, Vol. 3, Japan Society for Computational Engineering and Science, 1998. [10] D. Sekita, GeoFEM: multi-purpose parallel FEM for solid earth. 6. A strategy of the inter/intra system interfaces, in: Proceedings of the Conference on Computational Engineering and Science, Vol. 3, Japan Society for Computational Engineering and Science, 1998 (in Japanese). [11] C. Farhat, A simple and efficient automatic FEM domain decomposer, Comput. Struct. 28 (1988) 579–602.

114

K. Garatani et al. / Future Generation Computer Systems 18 (2001) 107–114

[12] R. Glowinski, Q.V. Dinh, J. Periaux, Domain decomposition methods for non-linear problems in fluid dynamics, Comput. Meth. Appl. Mech. Eng. 40 (1983) 27–109. [13] G. Yagawa, N. Soneda, S. Yoshimura, A large-scale finite element analysis using domain decomposition method on a parallel computer, Comput. Struct. 38 (1991) 615–625. [14] K. Garatani, et al., Study on parallelization method of structural analysis code, in: Proceedings of the HPCN’97, Lecture Notes in Computer Science, Vol. 1225, Springer, Berlin, 1997, pp. 1044–1046. [15] K. Garatani, H. Nakamura, G. Yagawa, Parallel finite element structural analysis code using DDM, in: Proceedings of the HPCN’98, Lecture Notes in Computer Science, Vol. 1401, Springer, Berlin, 1998, pp. 887–889.

Kazuteru Garatani is a Principal Research Engineer at the Research Organization for Information Science and Technology (RIST) in Tokyo, Japan. He received his Master’s degree in engineering from Osaka University in 1986. Thereafter he worked at CRC Research Institute, Power Reactor and Nuclear Fuel Development Corporation and RIST in the field of software engineering and inelastic analysis. His recent papers are on large-scale parallel finite element analysis and its applications in the fields of engineering and science. His web page is http://geofem.tokyo.rist.or.jp.

Hisashi Nakamura is the Director of the Computational Earth Science Division of Research Organization for Information Science and Technology (RIST). He received his Master’s degree in mechanical engineering from Ibaraki University in 1975 and served as a Nuclear Safety Researcher for Power Reactor and Nuclear Fuel Development Corporation from 1975 to 1995. He now works for a project on large-scale simulations of the earth phenomena through high performance and parallel computing. His web page is http://geofem.tokyo.rist.or.jp.

Hiroshi Okuda is an Associate Professor at the Department of Mechanical Engineering, Yokohama National University. He is the co-leader of the GeoFEM project being an Invitee Researcher at RIST. His current research interests are on parallel finite element computing in engineering and sciences, meshless methods, soft computing and solid earth simulations. He got his PhD from University of Tokyo in 1990 for research on application of optimization techniques to finite element fluid analysis. After working at the University of Tokyo as an Associate Professor, he moved to Yokohama National University in 1996. His recent papers are on parallel finite element analysis and its applications to engineering and solid earth issues, meshless methods and neural-net computing. His web page is http://typhoon.cm.me.ynu.ac.jp or http://geofem.tokyo.rist.or.jp.

Genki Yagawa is a Professor at the Department of Quantum Engineering and Systems Science, University of Tokyo. He is a Member of the Engineering Academy of Japan and Executive Council of International Association for Computational Mechanics, and a Fellow of the American Society of Mechanical Engineers. He has served as the Vice-President of the Japan Society for Simulation Technology, the Japan Society for Industrial and Applied Mathematics and Japan Society for Computational Engineering and Science. He is also the Founder and the first Chair of the JSME Computational Mechanics Division and a Member of the Board of Directors of the Japan Society of Mechanical Engineers and the Atomic Energy Society of Japan. He is the Co-Editor-in-Chief of the International Journal for Computational Mechanics and the General Co-Chairman of the International Conference Series on Computational Engineering Science (ICES). He is also the leader of the GeoFEM project. He received his Bachelor’s degree in 1965, Master’s in 1968 and Doctorate in 1970, all in the field of engineering from University of Tokyo. His web page is http://garlic.q.t.u-tokyo.ac.jp/.