Journal of Computational and Applied Mathematics (
)
–
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
Scalable parallel implementation of shooting method for large-scale dynamical systems. Application to bridge components S. Stoykov ∗ , S. Margenov Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str., bl. 25A, 1113 Sofia, Bulgaria
article
info
Article history: Received 8 December 2014 Received in revised form 28 March 2015 Keywords: Periodic response Frequency-response function Real-life structures High-performance computing Scalability
abstract The mathematical models applied to real-life engineering structures result into large-scale dynamical systems. The efficient computation of their dynamical characteristics requires the usage of advanced numerical methods with parallel algorithms. The shooting method, in combination with a continuation method, presents a powerful tool for analyzing and investigating the dynamical characteristics of such systems. The efficiency and scalability of the shooting method are analyzed in the current paper for linear and nonlinear equations. Its applicability to large-scale systems is demonstrated by structural component of bridge discretized by three-dimensional finite elements. © 2015 Elsevier B.V. All rights reserved.
1. Introduction The knowledge of dynamical behavior of elastic structures is essential for their design, analysis and maintenance. Different approaches and mathematical models have been developed for that purpose in the last decades. Enormous attention was paid to nonlinear dynamical systems because of their complex behavior and the necessity of efficient numerical methods for solving the equations. The main characteristics of linear and nonlinear dynamical systems are summarized briefly in this paragraph. The principle of superposition holds for linear systems, the time response due to harmonic excitation is also harmonic, with frequency equal to the excitation frequency, and the steady-state solution is unique and independent on the initial conditions. For nonlinear systems, the principle of superposition does not hold, the response of nonlinear systems can be periodic, quasi periodic or chaotic and there might exists more than one steady-state solution, which depends on the initial conditions. In addition, the nonlinear systems can have bifurcation points which can change the solution significantly. The nonlinear systems can exhibit complex motions which linear systems cannot. More details about the characteristics of nonlinear dynamical systems can be found in [1,2]. The linear systems are much easier to solve, but they have limited application. They are appropriate for small displacements. In many cases linearity is a rude approximation of the real problems. The necessity of using nonlinear models comes from the requirement of engineers to use more accurate and better physical models. In most cases, it is impossible to obtain analytical solution of nonlinear equations, thus one has to use appropriate numerical methods [3]. Apart from the tendency of developing more accurate physical models by considering nonlinear terms in the equation of motion, another necessity for the engineers is to obtain more precise solutions of nonlinear partial differential equations
∗
Corresponding author. E-mail addresses:
[email protected] (S. Stoykov),
[email protected] (S. Margenov).
http://dx.doi.org/10.1016/j.cam.2015.04.015 0377-0427/© 2015 Elsevier B.V. All rights reserved.
2
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
(PDE) on complex domains. A standard technique for transforming the PDE into a system of ordinary differential equations (ODE) is the finite element method [4]. Accuracy is achieved by using a fine mesh of finite elements or by increasing the order of the polynomials, used as shape functions. Both cases result into large systems of ODE. Analyzing the dynamical behavior of large-scale systems of nonlinear ordinary differential equations is a challenging and ambitious task. The variation of the solution with the excitation frequency (or frequency of vibration) is of primary interest for the analysis of the dynamical behavior of elastic structures [5]. The concepts of nonlinear normal modes (NNM) and nonlinear frequency-response function (FRF) have become standard tools for such analyses [6]. Each point from the frequency–amplitude (or frequency–energy) domain presents a periodic solution. The existence of quasi periodic solutions is determined by appearance of secondary Hopf bifurcation points [1,3]. A transition to chaotic response is also possible. The existence of cyclic-fold, subcritical period-doubling or subcritical Hopf bifurcation points is a preliminary requirement for appearance of chaos in the system [1]. There exist several methods for obtaining periodic responses of dynamical systems. The brute-force approach is the simplest one which uses a time integration scheme [7]. Another common approach is the harmonic balance method (HBM) where the solution is expressed in Fourier series and balance of harmonics is applied [8,9]. Periodic solutions can also be obtained by collocation method [10]. Shooting method computes iteratively the initial conditions which lead to periodic response [1]. Advantages of each of these methods are discussed in [11]. In this work, the shooting method is preferred for the analysis of large-scale dynamical systems because it is very suitable for implementation on distributed memory machines for parallel computing. The purpose of the paper is to present an efficient parallel realization of the shooting method for obtaining periodic responses of large-scale dynamical systems. The Bernoulli–Euler beam equation of motion is used as a model equation for studying scalability of the shooting method for linear and nonlinear systems. The potential of the proposed method to reallife applications is demonstrated on bridge structural component using three-dimensional finite elements and the linear equation of motion. Even though the proposed parallel implementation of the method is presented to forced vibrations, it is not limited to computing the nonlinear frequency-response function. It can also be applied to free vibration problems and for computing the nonlinear normal modes. In this case, predictor and corrector for the period of vibration should be defined, but the suggested parallel implementation can remain the same. 2. Parallel implementation of shooting method The shooting method is applied to systems of second order ODE. This formulation is preferred here, instead of the common formulation for systems of first order ODE, because of two reasons. First, the CPU time is significantly reduced, because the size of the independent systems (these systems arise from the shooting method and are shown in 2.1) remains equal to the size of the initial system, while if transformation into a system of first order ODE is performed, the size of the independent systems becomes double. Second, additional computations which arise from solving the system of first order ODE are not performed. These computations are related with the mass matrix. The computation of the inverse of the mass matrix can lead to loss of accuracy, it is time consuming and the resulting matrix is not sparse, hence it requires a lot of memory. Thus, such computation should be avoided and the mass matrix should remain on the left hand side of the system. On the other side, by keeping the mass matrix on the left hand side of the system, one still has to do additional computations for obtaining a numerical solution of the system. Thus, the application of the shooting method to large-scale systems of second order ODE is much more efficient when it is applied directly to the original system. A parallel implementation of the shooting method on multiple processors is proposed in this section. The sparse and dense matrices, which take part in the method, are pointed out and the distribution of the computations and memory among the processes is emphasized. Discussion about the scalability of the method to linear and nonlinear systems is presented. 2.1. Shooting method for systems of second order ODE The nonlinear equation of motion of elastic structure, after space discretization, has the following form: Mq¨ (t ) + Cq˙ (t ) + K (q (t )) q (t ) = f (t ) ,
(1)
where q (t ) is the vector of generalized coordinates, M represents the mass matrix, C—the damping matrix, K (q (t ))—the stiffness matrix that depends on vector q (t ) , f (t ) is the generalized vector of external forces. A harmonic excitation is considered, thus the vector of external forces is written as f (t ) = a cos (ωt ), where ω is the frequency of excitation. The total number of degrees of freedom (DOF) of the equation of motion (1) is denoted by N. The equation of motion has initial conditions denoted by: q (0) = q0 , q˙ (0) = q˙ 0 .
(2)
The shooting method consists of iterative computation of the initial conditions q0 and q˙ 0 which perform a periodic motion without transient response. Let the period of vibration be denoted by T . For linear systems T = 2π /ω and for nonlinear systems the period of vibration is an integer multiplied by 2π /ω. The equation of motion (1) is integrated in time domain, for one period of vibration. In
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
3
this work, Newmark’s time integration method is used. Together with the equation of motion (1), two additional systems, are also integrated. Each of these systems has N vectors of unknowns. These vectors are needed for correcting the initial conditions q0 and q˙ 0 . Details about the derivation of the shooting method for systems of second order ODE are given in [12]. The systems have the following form:
¨ d (t ) + CQ˙ d (t ) + J (q (t )) Qd (t ) = 0, MQ ¨ v (t ) + CQ˙ v (t ) + J (q (t )) Qv (t ) = 0, MQ
(3)
with initial conditions: Qd (0) = I,
˙ d (0) = 0, Q
Qv (0) = 0,
˙ v (0) = I, Q
(4)
˙ d , Qv and Q˙ v are matrices of dimension N × N , I is the identity matrix. M and C are the mass and damping where Qd , Q matrices from Eq. (1). J (q (t )) is the Jacobian of the internal force vector given by J (q (t )) = ∂K (q (t )) q (t ) /∂q (t ). For linear systems J does not depend on vector q (t ) and it is just the stiffness matrix of constant terms. The corrections of the initial conditions q0 and q˙ 0 , denoted by δ q0 and δ q˙ 0 , are obtained by solving the following system:
Qd (T ) − I
Qv (T )
δ q0 q0 − q (T , q0 , q˙ 0 ) = , ˙ v (T ) − I δ q˙ 0 q˙ 0 − q˙ (T , q0 , q˙ 0 ) Q
˙ d (T ) Q
(5)
˙ d (T ) and where Qd (T ) and Qv (T ) are solutions of systems (3) at time t = T , i.e. at the end of the period of vibration. Q ˙ v (T ) are the derivatives with respect to time. q (T , q0 , q˙ 0 ) and q˙ (T , q0 , q˙ 0 ) are the displacement and velocity vectors at Q time t = T . They depend on the initial conditions, which are written in brackets, in order to outline this dependence. The corrected initial conditions are given by: q (0) = q0 + δ q0 , q˙ (0) = q˙ 0 + δ q˙ 0 .
(6)
If linear systems are considered, the corrections of the initial conditions are obtained by one shooting procedure. In the case of nonlinear systems, several iterations can be performed till convergence is achieved. Their number depends on the initial guess of the initial conditions. 2.2. Parallel realization The process of obtaining periodic responses of dynamical systems by shooting method consists of several matrix–matrix multiplications and several solutions of linear systems. Such operations are computationally expensive. CPU time can be reduced by distributing these operations among multiple processors. A discussion about the efficient parallel realization of the method is presented in this sub-section. Large-scale dynamical systems often arise by space discretization with three-dimensional finite elements. The resulting matrices of such systems are sparse. On the other hand, the matrices Qd (t ) and Qv (t ) and their derivatives with respect to time are dense. Consequently system (5) is also dense. The efficient parallel realization of the shooting method requires the usage of basic matrix operations and solvers of linear systems for sparse and dense matrices. The proposed algorithm attempts to efficiently spread the computations and memory among multiple processors. By application of Newmark’s time integration method [7] to the equation of motion (1), the solution qt +1t at time t + 1t is obtained by solving the following system:
(a0 M + a1 C + K (qt +1t )) qt +1t = Ft +1t + (a0 M + a1 C) qt + (a2 M + a4 C) q˙ t + (a3 M + a5 C) q¨ t
(7)
where a0 = 1/β 1t , a1 = γ /β 1t , a2 = 1/β 1t, a3 = 1/2β − 1, a4 = γ /β − 1, a5 = 1t γ /2β − 1 are the constants of Newmark’s method, β and γ are parameters that can be determined to obtain accuracy and stability of the time integration method, 1t is the time step. qt , q˙ t and q¨ t are displacement, velocity and acceleration at time t. At the same time step, the solutions of Eqs. (3), Qdt +1t and Qvt +1t , are computed by the same time integration scheme: 2
(a0 M + a1 C + J (qt +1t )) Qdt +1t = (a0 M + a1 C) Qdt + (a2 M + a4 C) Q˙ dt + (a3 M + a5 C) Q¨ dt (a0 M + a1 C + J (qt +1t )) Qvt +1t = (a0 M + a1 C) Qvt + (a2 M + a4 C) Q˙ vt + (a3 M + a5 C) Q¨ vt .
(8)
Unlike system (7), where the unknown qt +1t is a vector, the unknowns in systems (8), Qdt +1t and Qvt +1t are dense matrices. Each vector-column of these matrices presents a solution, which is independent on the other vector-columns. A suitable parallel realization of the shooting method presents a separation of the matrices Qdt +1t and Qvt +1t into blocks of vector-columns. The partition of the dense matrices starts from the beginning of the shooting procedure, i.e. from the initial conditions given in (4). The same partition has to be applied to the velocity and acceleration of Qd and Qv .
4
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
By the proposed separation of the dense matrices, the computations of matrix–matrix multiplications, given on the right hand side of (8) are spread among the available processors. Furthermore, the solution procedure is also spread among the processors. The right hand side of systems (8) presents a block of vector-columns instead of a matrix. Each process computes a solution of several unknown vectors (the number depends on the available processors), instead of computing a solution of N unknown vectors, where N is the dimension of the system and consequently the size of each vector. The matrices M, C, K and J are sparse matrices which gives several advantages for memory storage and solution of linear systems. The library UMFPACK [13,14] which is a direct solver for sparse linear systems is used. ˙ d (T ) , Qv (T ) and Q˙ v (T ) are gathered and At the end of the time interval, i.e. at t = T , the partitions of matrices Qd (T ) , Q system (5) is composed. The dense linear system is solved by ScaLAPACK [15], which is library of high-performance linear algebra routines for parallel distributed memory machines and it is suitable for dense linear systems. The linear equations of motion have several advantages on the solution process. The stiffness matrix is a matrix of constant terms and the Jacobian of the internal forces presents the same matrix. Thus, systems (7) and (8) have common left hand side, it is possible to compute the factorization of this matrix once and to use the factorized matrix for the whole shooting iteration. If nonlinear equation of motion (1) is analyzed, the algebraic system (7) is also nonlinear. Thus, several internal iterations, for example by Newton–Raphson’s method, are needed for obtaining the solution at each time step. The Jacobian will be different for each time step and initial factorization cannot be used. However, the main consumption of CPU time is the matrix–matrix multiplications in (8) and the solution of the obtained independent linear systems. Thus, it is expected that the scalability and efficiency of the parallel realization of the shooting method for linear and nonlinear systems will be similar. Additional difference in nonlinear systems is that the period of vibration can be T = 2π /ω or an integer multiplied by 2π/ω due to appearance of period multiplying bifurcation point. Furthermore, more than one shooting iteration will be needed for the iterative computation of the initial conditions. This will increase the total CPU time, but will not influence the scalability of the algorithm. 3. Validation The parallel implementation of the shooting method is validated in this section. The efficiency and acceleration of the proposed parallel implementation are presented in the next section. For that purpose, the equation of motion of a Bernoulli–Euler beam is used. It is given by [16]:
∂ 2 w (x, t ) ∂ 4 w (x, t ) 3 ∂w (x, t ) 2 ∂ 2 w (x, t ) µ + EI − EA = f (t ) , ∂t2 ∂ x4 2 ∂x ∂ x2
(9)
where w (x, t ) is the transverse displacement of the beam, µ is mass per unit length, E is the elastic modulus, A is the cross sectional area, I is the second moment of area and f (t ) is the external force. Eq. (9) is discretized by the finite element method, Hermite cubic polynomials are used as shape functions. The dimension of the local mass and stiffness matrices is 4 × 4. Mass proportional and frequency dependent damping [17] is introduced in the system. A clamped–clamped beam with the following geometrical properties is used for the numerical results: length l = 0.58 m, width b = 0.02 m and thickness h = 0.002 m. The material is assumed to be isotropic and elastic (aluminum) with Young’s modulus E = 70 GPa and density ρ = 2778 kg/m3 . A harmonic force with amplitude of 0.024 N is applied on the middle of the beam. The results from the parallel run of the shooting method are compared with results obtained by the p-version of the FEM [18] and harmonic balance method. Sequential continuation method is used for defining a prediction for the next point from the FRF. The results are presented in Fig. 1. 4. Efficiency of shooting method The speedup and efficiency of the parallel algorithms are essential for evaluation of their performance. Scalability of the shooting method is studied in this section. The numerical computations are carried out on HPCG cluster [19] located at IICT-BAS. The cluster has 576 computing cores (Intel Xeon X5560 @ 2.8 GHz) organized in blade system with 36 blades BL 280c. Each blade has 24 Gb RAM. Non-blocking DDR interconnection via Voltaire Grid director 2004 with latency 2.5 µs and bandwidth 20 Gbps is used to connect the blades. The equation of motion of the Bernoulli–Euler beam (9) is used for studying the efficiency of the method. A system of ODE is generated by using 8193 elements. Because clamped–clamped boundary conditions are considered and the local mass and stiffness matrices are with dimension 4 × 4, the total number of DOF is N = 16 384. The time interval is divided on 128 equal time steps. Scalability of the method applied to the linear and to the nonlinear equation of motion is presented. The complete shooting iteration consists of solving systems (7) and (8) for one period of vibration and obtaining a solution of the linear system (5). The efficiency study of the parallel computations is separated into two parts. The first one takes into account the time integration process and the second one—the solution of the dense linear system (5).
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
5
Fig. 1. Comparison of FRF of Bernoulli–Euler beam. —– results obtained by p-version FEM, p = 10, and harmonic balance method, • • • results obtained by parallel run of shooting method, w —amplitude on the middle of the beam, h—thickness, ω—excitation frequency, ωl —fundamental frequency. (a) linear model, (b) geometrically nonlinear model. Table 1 Strong scalability results of one time integration step of shooting method. System with 16 384 DOF is used. It results from discretization of one dimensional linear beam equation of motion. P
CPU (s)
Speed up
Efficiency %
32 64 128 256
14.57 7.18 3.57 1.77
– 2.03 4.08 8.22
– 101.51 101.96 102.81
Table 2 Strong scalability results of PDGESV subroutine of ScaLAPACK library. Block size of 64 elements is used for two-dimensional block cyclic distribution. The dimension of the dense matrix from Eq. (5) is 32 768 × 32 768. One dimensional linear beam equation of motion is used for defining the dynamical problem. P
CPU (s)
Speed up
Efficiency %
32 64 128 256
1489.58 672.89 346.25 168.82
– 2.21 4.30 8.82
– 110.68 107.55 110.30
Strong scalability of average CPU time for one time integration step is presented in Table 1 for the linear equation of motion. One time integration step consists of six matrix–matrix multiplications and computation of the solutions of systems (7) and (8). Average CPU time for one time integration step is preferred because it is representative for the first part of the shooting method, i.e. the time integration process. The results confirm that the proposed parallel implementation of the shooting method is very suitable for distributed memory machines. The efficiency slightly increases with the number of processors, because the number of the independent systems decreases per process, when more processors are used, and because the systems are solved independently, hence communications between the processes are performed after this step. The second part of the shooting process, the solution of system (5), is obtained by ScaLAPACK library. Although its parallel performance has been investigated [15], the acceleration and efficiency of ScaLAPACK, for the system obtained by the shooting method (5), are presented in Table 2. Results about the performance of ScaLAPACK library are included in the paper, because the solution of the dense matrix is time consuming and its efficient solution by parallel processors is essential for the efficiency of the complete shooting algorithm. The efficiency and acceleration of the complete method are presented in Table 3. The decrease of the efficiency is explained by the additional communications between the processes which are performed at the end of the time integration ˙ d (T ), Qv (T ) and Q˙ v (T ) and to process. These communications are necessary to gather the partitions of matrices Qd (T ) , Q perform the two-dimensional block cyclic distribution. Such distribution of the dense matrix is necessary to be performed among the processors, in order to solve the dense system by ScaLAPACK. Additional comparisons of the CPU time for one time integration step are presented in Table 4. Models with different number of degrees of freedom and using different number of processors are considered. By increasing the dimension of the system twice, the computational time of the integration part of the shooting method is increased approximately four times. The additional computational cost comes from the increased number of linear systems. When the number of unknowns is doubled, the number of linear systems (8) is also doubled.
6
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
Table 3 Strong scalability results of complete iteration of shooting method. System with 16 384 DOF is used. It results from space discretization of one dimensional linear beam equation of motion. P
CPU (s)
Speed up
Efficiency %
32 64 128 256
3505.15 1696.81 879.72 519.73
– 2.07 3.98 6.74
– 103.29 99.61 84.30
Table 4 CPU time for one time integration step and required memory for the dense matrices per processor. Results with different number of CPUs and elements are compared. System with 16 384 DOF which results from discretization of one dimensional linear beam equation of motion is used. Elements
64 CPU
DOF CPU per time step (s) Memory per processor (Mb)
128 CPU
256 CPU
4097
8193
16 385
4097
8193
16 385
4097
8193
16 385
8192 1.63 104
16 384 7.33 416
32 768 – 1664
8192 0.83 52
16 384 3.43 208
32 768 14.85 832
8192 0.41 26
16 384 1.75 104
32 768 7.29 416
Table 5 Strong scalability results of one time integration step of shooting method applied to the nonlinear equation of motion. System with 16 384 DOF is used. It results from discretization of one dimensional nonlinear beam equation of motion. P
CPU (s)
Speed up
Efficiency %
16 32 64 128
30.04 15.16 7.75 4.00
– 1.98 3.87 7.52
– 99.09 96.83 93.94
The required memory per processor is also included in Table 4. Only the memory of the dense matrices is considered because they are partitioned among the available processors. 13 dense matrices are necessary for the shooting method: Qd , Qv and their first and second derivatives at two time steps (the current step and the previous one) plus one additional matrix which is used for keeping temporary data. The sparse matrices and the vectors require much less memory in comparison to the dense matrices, thus they are not included. Strong scalability of the method for the nonlinear equation of motion is presented in Table 5. Only the average CPU time for one time integration step is presented for the nonlinear problem, because the remaining part of the method is the same as for the linear equation of motion, i.e. gathering the elements of the dense matrix and the parallel solution of system (5). The efficiency of the nonlinear problem becomes slightly worse than the efficiency obtained for the linear problem. The differences come from the additional computations which are required on each time step: (1) the computations of the Jacobian and the nonlinear matrix, which are different on each time step; and (2) the factorization of the left hand side matrix of systems (8) has to be computed also on each time step. Nevertheless, the majority of computations are the matrix–matrix products and the solutions of the independent systems of Eqs. (8), thus the scalability of the parallel implementation of the shooting method applied to the nonlinear problem remains good. 5. Application to bridge structural component The potential of the proposed parallel implementation of the shooting method for investigating the dynamical behavior of real-life structures is demonstrated in this section. A bridge structural component is used for the numerical examples. The structure is shown in Fig. 2. The dimensions of the structure are (notations from Fig. 2 are used): length l = 22 m, width b = 1 m, thickness h = 0.25 m, inner radius r1 = 10 m, outer radius r2 = 10.5 m, angle ϕ = π /3 rad. The material is assumed to be aluminum, hence the same material properties from Section 3 are used. The structure is assumed to have clamped boundary conditions on both foundations and on both sides of the planar part. The finite element method is applied on the three-dimensional linear equation of elasticity [18] for deriving the system of ODE (1). The structure is discretized by quadratic tetrahedron elements, each element has 10 nodes and 30 degrees of freedom. In the numerical experiment, a mesh which results into 9906 DOF is used. The Elmer software [20] is used for defining the local mass and stiffness matrices and performing the assembling process of the elements. The structure of the sparse matrices, obtained from the discretization of the example from Fig. 2, differs from the one of the beam element because three-dimensional finite elements are used. Therefore, the scalability of the three-dimensional example is investigated first and then the complete FRF of the bridge component is presented. Strong scalability results, but for the bridge component, are presented in Tables 6–8. It is pointed out that the CPU time for one time integration step is
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
7
Fig. 2. Bridge structural element used for the numerical analysis performed by shooting method. Table 6 Strong scalability results for one time integration step of shooting method. System with 9906 DOF which results from discretization of three-dimensional equation of elasticity is used. P
CPU (s)
Speed up
Efficiency %
16 32 64 128
54.55 26.10 12.54 6.16
– 2.09 4.35 8.86
– 104.50 108.76 110.70
Table 7 Strong scalability results of PDGESV subroutine of ScaLAPACK library. The dimension of the dense matrix from Eq. (5) is 19 812 × 19 812. Three-dimensional equation of elasticity is used for defining the dynamical problem. P
CPU (s)
Speed up
Efficiency %
16 32 64 128
409.82 186.28 94.08 47.68
– 2.20 4.36 8.60
– 110.00 108.90 107.44
Table 8 Strong scalability results of complete iteration of shooting method. System with 9906 DOF which results from space discretization of the three-dimensional equation of elasticity is used. P
CPU (s)
Speed up
Efficiency %
16 32 64 128
7444.32 3694.12 1890.68 1003.84
– 2.02 3.94 7.42
– 100.76 98.43 92.70
bigger than the beam case, even though the DOF are less. This results from the usage of three-dimensional finite elements, which results into a sparse matrix with bigger bandwidth than the one form the beam model. The results demonstrate that the proposed parallel implementation of the shooting method, in combination with ScaLAPACK library, efficiently computes the periodic responses of elastic structures discretized by three-dimensional finite elements. The natural frequencies and the corresponding modes of vibration are essential for analyzing the frequency-response function. For linear systems, they determine the resonances of the structure and the shapes of vibration. The first four modes and the natural frequencies of the structure are presented in Fig. 3. It is pointed out that the first mode of vibration presents motion in yz plane. A uniformly distributed harmonic force is applied on the upper surface of the structure. The amplitude of the force is 100 kN/m2 . A damping proportional to the mass matrix and dependent of the excitation frequency is used in the model. A damping factor α = 0.01 is considered, i.e. the damping matrix is expressed by [17]: C=
0.01ωl2
ω
M.
(10)
The excitation frequency ranges from 60 to 200 rad/s. The FRF function is measured at two points on the upper surface of the bridge, for values of the longitudinal coordinate x = 10.5 m and x = 0 m. The results are shown in Fig. 4.
8
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
Fig. 3. Natural modes and frequencies of vibration of the bridge component from Fig. 2. The initial structure is also shown with the finite element mesh.
Fig. 4. Frequency-response function of the bridge component due to external harmonic excitation with uniformly distributed load of 100 kN/m2 , ω— excitation frequency (rad/s), w —amplitude (m) measured at (a) x = 10.5 m, (b) x = 0 m.
The maximum amplitude of the FRF is achieved for values of the excitation frequency equal to the third and fourth natural frequencies. The third and fourth modes present vibration in the plane of the applied load. Because the equation of motion is linear, the response of the beam presents a combination (superposition) of linear modes of vibration [1]. The maximum displacements of the structure are presented in Fig. 5, for three different excitation frequencies. Resonance does not appear for excitation frequency equal to the fundamental natural frequency, because the applied force is only in xz plane. Resonance does not appear also for excitation frequency equal to the second natural frequency, because the applied force is uniformly distributed and hence, symmetric with respect to xz plane. The second mode is asymmetric with respect to this plane and can be excited with asymmetric force. 6. Conclusion A parallel realization of the shooting method and its application to large-scale dynamical systems was presented in this paper. Such systems arise by space discretization of real engineering applications. The shooting method is suitable
S. Stoykov, S. Margenov / Journal of Computational and Applied Mathematics (
)
–
9
Fig. 5. Displacement of the bridge structure due to uniformly distributed load of 100 kN/m2 with excitation frequency (a) ω = 102.48 rad/s, (b) ω = 146.0 rad/s, (c) ω = 158.79 rad/s.
for dynamical analysis of elastic structures in frequency domain. Thus, its parallel realization is essential for efficient computation of the frequency-response functions of large-scale dynamical systems. The efficiency and scalability of the method were presented for linear and nonlinear dynamical problems. It was shown that the method scales close to linearly with the proposed parallel implementation, when the linear equation of motion is analyzed. For nonlinear equations of motion, the scalability of method still remains very good, even though it is not linear. The additional computations required for solving the nonlinear problem influence the scalability of the parallel realization. Further improvements can be considered by implementing parallel algorithms in the computation of the Jacobian, the nonlinear matrix and the factorization of the common left hand side of the independent linear systems. The applicability of the method to real-life structures was presented on a bridge component, discretized by threedimensional finite elements. The linear frequency-response function was presented. The shapes of vibration, for different excitation frequencies were also shown. The future work will investigate the performance of the method on geometrically nonlinear systems obtained by threedimensional finite elements. Stability of the solution, bifurcation points and the corresponding secondary branches will be presented. Acknowledgment The research work carried out in the paper is supported through the project AComIn ‘‘Advanced Computing for Innovation’’, grant 316087, funded by the FP7 Capacity Programme (Research Potential of Convergence Regions). References [1] A. Nayfeh, B. Balachandran, Applied Nonlinear Dynamics: Analytical Computational and Experimental Methods, John Wiley & Sons Inc., New York, 1995. [2] F. Verhulst, Nonlinear Differential Equations and Dynamical Systems, Springer, Berlin, 1996. [3] R. Seydel, Practical Bifurcation and Stability Analysis, third ed., Springer, New York, 2010. [4] T. Belytschko, W.K. Liu, B. Moran, Nonlinear Finite Elements for Continua and Structures, John Wiley & Sons Ltd., Chichester, 2000. [5] A. Nayfeh, D. Mook, Nonlinear Oscillations, John Wiley & Sons, 1995. [6] A. Vakakis, Non-linear normal modes (NNMs) and their applications in vibration theory: an overview, Mech. Syst. Signal Process. 11 (1997) 3–22. [7] K. Bathe, Finite Element Procedures, Prentice-Hall, New Jersey, 1996. [8] W. Szemplinska-Stupnicka, The Behaviour of Nonlinear Vibrating Systems, Kluwer Academic Publishers, Dordrecht, 1990. [9] P. Ribeiro, Nonlinear vibrations of simply-supported plates by the p-version finite element method, Finite Elem. Anal. Des. 41 (2005) 911–924. [10] K. Engelborghs, E. Doedel, Stability of piecewise polynomial collocation for computing periodic solutions of delay differential equations, Numer. Math. 91 (2002) 627–648. [11] G. Kerschen, M. Peeters, J. Golinval, A. Vakakis, Nonlinear normal modes, Part I: A useful framework for the structural dynamicist, Mech. Syst. Signal Process. 23 (2009) 170–194. [12] S. Stoykov, S. Margenov, Numerical computation of periodic responses of nonlinear large-scale systems by shooting method, Comput. Math. Appl. 67 (2014) 2257–2267. [13] T. Davis, Algorithm 832: UMFPACK, an unsymmetric-pattern multifrontal method, ACM Trans. Math. Softw. 30 (2004) 196–199. [14] T. Davis, Direct Methods for Sparse Linear Systems, SIAM, ISBN: 0898716136, 2006. [15] L.S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R.C. Whaley, ScaLAPACK Users’ Guide, Society for Industrial and Applied Mathematics, ISBN: 0-89871-397-8, 1997. [16] L. Meirovitch, Elements of Vibration Analysis, McGraw-Hill Book Company, New York, 1986. [17] S. Stoykov, P. Ribeiro, Stability of nonlinear periodic vibrations of 3D beams, Nonlinear Dynam. 66 (2011) 335–353. [18] O. Zienkiewicz, R. Taylor, J. Zhu, The Finite Element Method: Its Basis and Fundamentals, sixth ed., 2005. [19] E. Atanassov, T. Gurov, A. Karaivanova, S. Ivanovska, M. Durchova, D. Georgiev, D. Dimitrov, Tuning for Scalability on Hybrid HPC Cluster, in: Mathematics in Industry, Cambridge Scholar Publishing, ISBN: 978-1-4438-6401-5, 2014, pp. 64–77. [20] Elmer web site: www.csc.fi/elmer (last time visited 28/03/2015).