Journal of Computational Physics 252 (2013) 382–403
Contents lists available at SciVerse ScienceDirect
Journal of Computational Physics www.elsevier.com/locate/jcp
Accelerated 3D multi-body seakeeping simulations using unstructured finite elements Borja Serván-Camas a,∗ , Julio García-Espinosa a,b a b
Centre Internacional de Mètodes Numèrics en Enginyeria (CIMNE), Division of Naval Research, Gran Capitan s/n, 08034 Barcelona, Spain Universitat Politècnica de Catalunya, BarcelonaTech (UPC), Campus Nàutica, Edif. NT3, C. Escar 6-8, 08039 Barcelona, Spain
a r t i c l e
i n f o
Article history: Received 24 July 2012 Received in revised form 17 April 2013 Accepted 18 June 2013 Available online 1 July 2013 Keywords: Seakeeping Multi-body Finite element Time domain Unstructured mesh GPU
a b s t r a c t Being capable of predicting seakeeping capabilities in the time domain is of great interest for the marine and offshore industries. However, most computer programs used work in the frequency domain, with the subsequent limitation in the accuracy of their model predictions. The main objective of this work is the development of a time domain solver based on the finite element method capable of solving multi-body seakeeping problems on unstructured meshes. In order to achieve this objective, several techniques are combined: the use of an efficient algorithm for the free surface boundary conditions, the use of deflated conjugate gradients, and the use of a graphic processing unit for speeding up the linear solver. The results obtained by the developed model showed good agreement with analytical solutions, experimental data for an actual offshore structure model, as well as numerical solutions obtained by other numerical method. Also, a simulation with sixteen floating bodies was carried out in an affordable CPU time, showing the potential of this approach for multi-body simulation. © 2013 Elsevier Inc. All rights reserved.
1. Introduction Seakeeping is a topic of great interest in marine and offshore engineering. This interest is growing in the last years due to the boost given by the development of marine renewable energies. In this context, the development of time domain seakeeping programs is a main request from the industry. Moreover, the simulation of multi-body systems is a key point in the development of more efficient marine renewable technologies such as wave energy converters, floating wind turbines, etc. Up to date the numerical simulation of seakeeping has been mostly carried out in the frequency domain. The reason might be that computational cost of time domain simulations were too high and too long when compared to those of frequency domain. Moreover, assumptions like linear waves and the harmonic nature of water waves made the frequency domain to be the right choice. However, nowadays computing capabilities make possible to carry out numerical simulations in the time domain in a reasonable time. Time domain simulation has the advantage of simulating phenomena that cannot be handled in frequency domain such as transient situations due to wave amplitude modulation over time, parametric resonance, and other nonlinear effects. Furthermore, in the frequency domain, the simulation of multi-body systems requires calculating the interaction among the bodies, which increases quickly the computational effort as the number of bodies increase. On the other hand, when simulating in the time domain, the interaction among bodies is solved in a natural way without leading to a big increase of computational effort.
*
Corresponding author. Tel.: +34 93 405 4627. E-mail address:
[email protected] (B. Serván-Camas). URL: http://www.cimne.com/ (B. Serván-Camas).
0021-9991/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jcp.2013.06.023
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
383
Regarding the numerical method traditionally adopted for seakeeping simulation, the boundary element method (BEM) has dominated over others like finite element method (FEM). We might find the reason in the fact that most of the computational effort is spent in solving the Laplace equation. Then it might look like BEM offers a lower computational effort. However, Cai et al. [1] carried out a heuristic study regarding the computational effort required for solving the Laplace equation by BEM and FEM. This study concluded that for a similar three-dimensional problem and a discretization size, the number of unknowns of BEM and FEM are O ( N 2 ) and O ( N 3 ), respectively, being N the number of unknowns needed in one dimension to achieve the desired discretization. But the computational costs are O ( N 4 ) and O ( N 3 ) respectively. Therefore, while BEM might be more efficient for problems with low number of unknowns, FEM becomes more efficient respect to BEM as the number of unknowns N increases. Considering that computational capabilities continuously increases, so does the complexity of problems to be undertaken, and the number of unknowns required. Hence FEM might become more efficient than BEM. Another advantage of FEM is the fact that it has been conceived for naturally solving the Laplace equation on unstructured meshes. This makes easier the discretization of complex computational domains. In the last decade, there have been extensive applications of the finite element method (FEM) to free surface problems. For example, Oñate and García [2] presented a stabilized FEM for fluid structure interaction in the presence of free surface where the latter was modeled by solving a fictitious elastic problem on the moving mesh. In [3,4] Löhner et al. developed a FEM capable of tracking violent free surface flows interacting with objects. Also García et al. [5] developed a new technique to track complex free surface shapes. However, many works like the previous ones are based on solving the Navier–Stokes equations, too expensive computationally speaking when it comes to simulating real problems regarding ocean waves interacting with floating structures. These sorts of problems can be more cheaply simulated using potential flow theory along with Stokes’ perturbation approximation. For details on Stokes’ wave theory, the reader is referred to [6]. With regards to wave–body interaction problems, there has been extensive work as well in the last decade. In [7], Wu and Eatock Taylor used both the FEM and the mixed FEM to analyze the two-dimensional (2D) nonlinear transient water wave problems. Later Wu and Eatock Taylor [8] made a detailed comparison between FEM and the boundary element method (BEM) for the nonlinear free surface flow problem and found that the former was more efficient in terms of both CPU and memory requirement. Greaves et al. [9] employed quad-tree-based unstructured meshes to model fully nonlinear waves in 2D, using an ALE formulation in structured meshes. In [10] an hp-element technique was adopted to simulate the 2D free surface flow problem. In [11] and [12], an implementation of FEM schemes to simulate 3D wave–body interaction was introduced using moving meshes along with an explicit time marching scheme for the free surface boundary condition. However, in those cases, re-meshing and interpolation were needed, which leads to a high CPU cost. Westhuis [13] in his PhD dissertation developed a FEM code for nonlinear waves and focused in the development of groups of waves. The code relied in some specific structured mesh configurations and without body interaction. Hu et al. [14] applied FEM to study the case of a vertical cylinder under forced motions based on the works [7] and [8]. Turnbull et al. [15] coupled structured and unstructured meshes to simulate 2D wave–body interactions. The vertical velocity at nodes belonging to the free surface required a prescribed number of nodes to be aligned vertically. Wu et al. [16] solved a 3D problem using a semistructured mesh in the vertical direction. This way the nodes will be aligned vertically and the vertical component of the velocity at the free surface can be easily estimated by finite difference. Wang et al. [17] used FEM to study the effect of second-order wave sloshing within a tank in 2D. The fourth-order Runge–Kutta method was used as a time marching scheme for the free-surface boundary condition. A FEM solver for a second-order wave diffraction by an array of vertical cylinder using semistructured mesh has been presented in [18]. Again, in order to estimate vertical velocity at the free surface nodes it is required a prescribed number of nodes to be aligned vertically. An explicit fourth-order Adams–Bashforth scheme was used as a time marching scheme for the free surface boundary condition. Later on, the same authors in [19] used a structured mesh based on rectangular elements to study second-order resonance effects. Yan et al. [20] applied the fully nonlinear potential for modeling overturning waves. To achieve that, a moving mesh technique was adopted to track down the free surface. Consequently, computational times are large. Recently, Song et al. developed a new scaled boundary finite element method (SBFEM) for linear waves and structure interaction [21]. The SBFEM works in the frequency domain, and base functions for boundary elements based on Hankel functions were used for unbounded sub-domains where waves asymptotically disappear. This leads to a decrease in the number of elements needed for the simulation which improves the numerical efficiency of the method. Despite of the great effort invested in the last years to the development of FEM algorithms, to the authors’ knowledge, yet there has not been developed a fast FEM (by fast we mean in the order of minutes for practical problems) for solving first-order wave structure interaction in the time domain using unstructured meshes. The use of structure or semi-structure meshes is a big limitation since it limits the complexity of the geometry to be used. In this study we present a FEM for wave–structure interaction that can be used with unstructured meshes. Besides, since it is based on Stokes’ wave theory, no re-meshing or moving mesh technique are needed, which keeps computational costs and computational times low. The algorithm has been adapted to include nonlinear external forces, like those used to define mooring systems. Solver deflation is gaining popularity as a way to improve convergence in linear solvers, and therefore to reduce solver iteration and CPU time. Deflation works by solving in a fast and coarse manner eigenmodes associated with the lower eigenvalues [22,23] (slow convergence modes). Therefore, it will better improve convergence in those problems with a larger range of eigenmodes, such as flows in pipes.
384
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
The outline of this paper is as follows. In Section 2, the statement of the governing equations of the first-order diffraction–radiation problem of a set of floating bodies is presented. In Section 3 the FEM formulation is presented, the free surface and radiation boundary condition are applied to the problem, and the boundary condition on the bodies and the body dynamics solution are explained in detail. The accuracy of the new formulation for analysis of waves and floating structures interaction is verified in different validation cases in Section 4, comparing against analytical, experimental and BEM solutions. Finally Section 5 contains the conclusions of this work. 2. Problem statement 2.1. Governing equations We consider the first-order diffraction–radiation problem for a set of floating bodies:
∇ 2 ϕ = 0 in Ω
(1)
∂t ϕ + g ξ + P /ρ = P a /ρ on z = 0 (dynamic free surface boundary condition)
(2)
∂t ξ − ∂z ϕ = 0 on z = 0 (kinematic free surface boundary condition)
(3)
∂z ϕ = 0 on z = − H
(4)
∇ ϕ · nB = vB · nB
(5)
on Γ B
where ϕ and ξ are the first-order potential and free surface elevation respectively; Ω is the fluid domain; z = 0 represents the mean free surface level; P is the local pressure over the free surface; P a is the atmospheric pressure; ρ is the water density; Γ B represents the wetted surface of the present bodies; v B represents the local body velocity on the body surface; and H is the water depth. The domain is assumed to be infinite in the horizontal directions. 2.2. Velocity potential decomposition The aim of this work is to simulate the dynamics of a set of floating bodies subjected to the action of waves. To do so, we will model the environment as the sum of a number of Airy waves. Airy waves are solution of the mathematical model in the absence of bodies and free surface pressure P equal to an atmospheric pressure P a . They can be expressed in terms of their velocity potential and free surface elevation as follows:
Am g cosh(|km |( H + z)) cos |km |(x cos αm + y sin αm ) − ωm t + δm ωm cosh(|km | H ) m ζ = Am sin |km |(x cos αm + y sin αm ) − ωm t + δm ψ=
(6)
where A m are the wave amplitudes; ωm are the wave frequencies; km are the wave numbers; αm are the wave directions; and δm are wave phases. From this point on, we will refer to ψ as the incident potential. The potential ψ , along with the 2 free surface elevation ζ and the dispersion relation ωm = k|km | tanh(|km | H ), fulfills the following equations:
∇ 2 ψ = 0 in Ω
(7)
∂t ψ + g ζ = 0 on z = 0 (dynamic free surface boundary condition)
(8)
∂t ζ − ∂z ψ = 0 on z = 0 (kinematic free surface boundary condition)
(9)
In order to obtain the solution to the governing equations, we will use a velocity potential decomposition. Let ϕ and ξ be the solution to the governing equations. Then ϕ and η can be decomposed as ϕ = ψ + φ and ξ = ζ + η , where φ and η represent the velocity potential and free surface elevation of waves diffracted and radiated by the bodies. Introducing the velocity potential decomposition into the governing equations, we obtained the equation to be fulfilled by φ :
∇ 2 φ = 0 in Ω
(10)
∂t φ + g η + P /ρ = P a /ρ on z = 0 (dynamic free surface boundary condition)
(11)
∂t η − ∂z φ = 0 on z = 0 (kinematic free surface boundary condition)
(12)
∂z φ = 0 on z = − H
(13)
∇φ · n B = (v B − ∇ψ) · n B
on Γ B
(14)
Our problem is to find φ and η for a given set of incident waves and v B . To do so, we will solve Eqs. (10)–(14) in a finite domain by means of the finite element method.
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
385
2.3. Radiation condition and wave dissipation Waves represented by φ are born at the bodies and propagate in all directions away from the bodies. These waves have to either be dissipated or be let go out of the domain so they will not come back and interact with the bodies. Then, a Sommerfeld radiation condition at the edge of the computational domain will be used:
∂t φ + c R ∇φ · n R = 0 on Γ R
(15)
where Γ R is the surface limiting of the domain in the horizontal directions, and c R is a prescribed wave velocity. Eq. (15) will let waves with phase velocity c R to escape out the domain. However, waves with different phase velocities will not be leaving the domain. Then, wave dissipation is inserted into the dynamic free surface boundary condition by varying the pressure at the free surface such that:
P /ρ = P a /ρ + κ (x)∂z φ
(16)
where κ (x) is a damping coefficient. Eq. (16) increases pressure when the free surface is moving upwards, while decreases the pressure when the free surface is moving downwards. Then, energy is transferred from the waves to the atmosphere, leading to wave damping. However, the coefficient κ (x) will be set to zero in the area nearby the bodies to make sure damping will no affect to the wave–structure interaction. Combining the dynamic and kinematic boundary conditions, introducing Eq. (16), and adding Eq. (15), the governing equations for φ and η become:
∇ 2 φ = 0 in Ω
(17)
∂tt φ = − g ∂z φ − κ (x)∂t ∂z φ on z = 0
(18)
∂z φ = 0 on z = − H
(19)
∇φ · n B = (v B − ∇ψ) · n B
on Γ B
(20)
∂t φ + c R ∇φ · n R = 0 on Γ R 1
P
g
ρg
η = − ∂t φ −
(21)
on z = 0 (kinematic free surface boundary condition)
(22)
Eqs. (17)–(21) are a closed system of equations for the velocity potential. Therefore, the free surface elevation decoupled from the problem.
η has been
3. Finite element formulation 3.1. Laplace solver This section presents the FEM formulation to solve the Laplace equation along with boundary conditions. This formulation has been developed to be used in conjunction with unstructured meshes. The use of unstructured meshes enhances geometry flexibility and speed ups the initial modeling time. An automatic unstructured grid generator based on the advancing front method is used to generate triangular surface and tetrahedral volume meshes. Let Q h∗ be the finite element space to interpolate functions, constructed in the usual manner. From this space, we can construct the subspace Q h,φ , that incorporates the Dirichlet conditions for the potential φ . The space of test functions, denoted by Q h , is constructed as Q h,φ , but with functions vanishing on the Dirichlet boundary. The weak form of the Laplace problem can be written as follows. Find [φn ] ∈ Q h,φ , by solving the discrete variational problem:
B h · φn
∇ νh · ∇φh dΩ = Ω
φnB ,
Γ
φ nR ,
Z
0
Z
ν B
R h · φn
dΓ + Γ
ν R
dΓ + Γ
Z
νh · φn dΓ + Z0
0
Γ
νh · φnZ − H dΓ
∀νh ∈ Q h
(23)
Z−H
−H
where φn and φn are the potential normal gradients corresponding to the Neumann boundary conditions on bodies, radiation boundary, free surface and bottom, respectively. At this point, it is useful to introduce the associated matrix form [24]:
Lφ = b B + b R + b Z 0 + b Z − H
(24)
where L is the standard finite element Laplacian matrix, and b B , b R , b Z 0 and b Z − H are the vectors resulting from integrating the corresponding boundary condition terms [24]. Regarding the bottom boundary for the refracted and radiated potential, it is imposed naturally in FEM by b Z − H = 0.
386
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
3.2. Free surface boundary condition Solving the free surface boundary condition efficiently is the most important point of the problem stated using unstructured meshes. Since free surface boundary condition represents the temporal evolution of the velocity potential, it is traditionally used time marching schemes such as the fourth-order Runge–Kutta (RK4) and fourth-order Adams–Bashforth (AB4). The RK4 is an explicit four steps methods that required estimation of intermediate point to advance from time t to time t + t. Then, the Laplace equation has to be solved four times and ∂z φ has to be estimated each time. On the other hand, the AB4 is an explicit scheme that only requires solving the Laplace equation and estimate ∂z φ once per time step. However the AB4 requires storing information at the free surface of the previous 4 time steps. For both algorithms, RK4 and AB4, the values of ∂z φ at the free surface are to be estimated so that the scheme can keep marching in time. Following this reasoning, an implicit scheme looks like an expensive option since convergence of the Laplace solution and the free surface numerical scheme would need to be achieved through an iterating procedure. Moreover, based on a literature review, the estimation of ∂z φ is usually carried out by finite difference schemes requiring the first layer of nodes to be vertically aligned. From our point of view this is a big restriction since not completely unstructured meshes can be used near the free surface. In order to overcome the above mentioned limitations, we propose to use a fourth-order compact Padé scheme [25]. This scheme contains desirable properties such as: being implicit with symmetric stencils, good stability properties, and requiring solving the linear system in Eq. (24) just once per time step (rather than several times like RK4 or AB4). Although the free surface boundary condition is traditionally implemented as a Dirichlet boundary condition [13,18,19], in this work it is implemented as a Neumann boundary condition that fulfills the flux boundary integral: Z
b Z 0 = MΓ Z 0 φ z 0
(25) Z φz 0
where MΓ Z 0 is the corresponding boundary mass matrix [24]. Rather than obtaining the vector and calculating the values of b Z 0 , we will proceed in a different manner. Let’s consider the free surface boundary condition where the absorbing factor equals zero. The fourth-order compact Padé scheme reads, for Eq. (18), as:
φ n +1 − 2φ n + φ n −1 1 = − g ∂z φ n − g ∂ z φ n +1 − 2∂ z φ n + ∂ z φ n −1 12 t 2
(26)
Introducing the Taylor series expansion around time t in Eq. (26), and using Eq. (18), we recover the free surface boundary condition with O (t 4 ). Eq. (26) is an implicit scheme which has to be solved along with the linear system given in Eq. (24). At first sight seems like an iterating procedure should be used requiring solving the linear system several times per time steps. However, this can be avoided by decoupling φ n+1 and ∂z φ n+1 . The decoupling is carried out as follows: from Eq. (26) we can obtain φzn+1 as a function of φ n+1 :
∂z φ n+1 = −10∂z φ n − ∂z φ n−1 −
12 g t 2
φ n +1 − 2φ n + φ n −1
(27)
Z
This approximation is used to evaluate φ z 0 in t n+1 , and therefore, the integral of b0Z in t n+1 can be calculated as follows:
b
Z 0 n +1
= MΓ Z 0
Z n+1 φz 0
= MΓ Z 0 −10
Z n φz 0
−
Z n−1 φz 0
−
12 g t 2
φ
n +1
− 2φ + φ
Introducing Eq. (28) into Eq. (24) we obtain:
L+
12 g t 2
MΓ Z 0 φ
n +1
= MΓ Z 0
12 g t 2
n
2φ − φ
n −1
− 10
Z n φz 0
−
Z n−1 φz 0
n
n −1
n+1 R n+1 + bB + b
(28)
(29)
Eq. (29) imposes a strong coupling between the free surface boundary condition and the Laplacian equation matrix. This is carried out by modifying the system matrix L. Once the system is solved, ∂z φ n+1 at the free surface is obtained straightforward using Eq. (27). Then, once the velocity potential is solved at the new time step, the free surface elevation is computed by means of η = −(1/ g )∂t φ using the following fourth-order finite difference scheme:
η n +1 = −
1 g t
25 12
4
1
3
4
φ n +1 − 4φ n + 3φ n −1 − φ n −2 + φ n −3
(30)
3.3. Implementing the free surface boundary condition with absorption In order to include the wave damping effect, we discretize Eq. (18) as follows:
∂ z φ n +1 − ∂ z φ n −1 φ n +1 − 2φ n + φ n −1 1 n n +1 n n −1 ∂ − = − g ∂ φ − φ − 2 ∂ φ + ∂ φ κ ( x ) g z z z z 12 2t t 2
(31)
Eq. (31) is simply Eq. (27) plus a second-order finite difference in time for the absorbing term. Then ∂z φ n+1 can be obtained as:
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
∂ z φ n +1 = −
10g t g t + 6κ (x)
∂z φ n −
387
n +1 g t − 6κ (x) 12 ∂ z φ n −1 − φ − 2φ n + φ n −1 g t + 6κ (x) g t 2 + 6κ (x)t
(32)
Proceeding like in the previous section, we obtain:
L+
12 g t 2 + 6κ (x)t
MΓ Z 0 φ n+1 = MΓ Z 0
g t 2 + 6κ (x)t
− MΓ Z 0
+ b
12
10g t g t + 6κ (x)
B n +1
+ b
2φ n − φ n −1
Z n φz 0
+
g t − 6κ (x) g t + 6κ (x)
R n +1
Z n−1 φz 0
(33)
This implementation has several advantages over traditional implementations such as RK4 and AB4. The linear system has to be solved only once per time step (RK4 requires four times); the free surface numerical scheme is implicit (AB4 and RK4 are explicit); only information from two previous times at the free surface is required (AB4 requires information from four previous times); there is no restriction about the grid structure, hence unstructured meshes can be used with no restriction; and the vertical fluid velocity at the free surface is easily computed using Eq. (32). With regards to the wave absorption algorithm used in this work, we recognize that more sophisticated absorption mechanism can be found in the literature. However, despite of its simplicity, our experience tells us that the performance of this mechanism is good enough for the purpose of this work. The increase of the number of elements needed for absorbing the waves is not that big compared with the number of elements required in the analysis area, where no wave absorption exists. The element size is usually much smaller in the analysis area for accuracy, and increases gradually in the absorption area as it gets farther from the analysis area. 3.4. Radiation boundary condition When the fluid domain is unbounded, implementing a radiation boundary condition is mandatory to avoid artificial wave reflexions at the edges of the computational domain. We make use of the Sommerfeld radiation condition given in Eq. (15). Then we approximate the radiation condition by:
R n+1 φ n +1 − φ n φn =− c R t
(34)
Then the term b R can be calculated through:
bR
n+1
n+1 1 = MΓ R φ nR = MΓ R φ n − φ n−1 t
(35)
and Γ R represents the outlet boundary condition and has been assumed to be vertical. Introducing Eq. (35) into Eq. (33) we get:
L+
12 g t 2 + 6κ (x)t
MΓ Z 0 +
c
t
MΓ R φ n+1 = MΓ Z 0
g t 2 + 6κ (x)t
− MΓ Z 0 +
cR
t
12 10g t g t + 6κ (x)
MΓ R φ n + b
B n +1
2φ n − φ n −1 Z
φz 0
n
+
g t − 6κ (x) g t + 6κ (x)
Z
φz 0
n−1
(36)
A strong coupling between the Sommerfeld radiation condition and the Laplacian equation matrix has been used, as we did with the free surface previously. Hence the boundary condition is integrated within the system matrix, avoiding iterating among the equations. Despite of the simplicity of the radiation condition used, we found that its performance is good enough for the purpose of this work, and there is no need of using more sophisticated schemes. 3.5. Body boundary condition Body boundary conditions are given by Eq. (14). The movements of the bodies are assumed to be small enough so the computational domain can remind steady, as well as the normals to be bodies’ surface. Then the term b B will be calculated by:
bB
n+1
n+1 = MΓ B φ nB
(37)
388
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
3.6. Multi-body dynamics Once the velocity potential has been obtained, the pressure at any point within the fluid domain is calculated from:
P (x) = −ρ ∂t
ϕ (x) − ρ gz = −ρ ∂t ψ(x) − ρ ∂t φ(x) − ρ gz
(38)
Eq. (38) requires estimating the value of ∂t φ . Then, the same fourth-order finite difference scheme that has been used for the free surface elevation is used here:
n+1
P n+1 (x) = −ρ ∂t ψ(x)
−
ρ
25
t 12
4 1 φ n+1 (x) − 4φ n (x) + 3φ n−1 (x) − φ n−2 (x) + φ n−3 (x) − ρ gz 3
4
(39)
Integrating the pressure over the bodies’ surface, the hydrodynamic forces and moments are obtained. On the other hand, the body dynamics is given by the equation of motion:
MXtt + K X = F
(40)
where M is the mass matrix of the multi-body system; K is the hydrostatic restoring coefficient matrix of the multi-body system; F is the hydrodynamic forces induced by dynamic pressures plus any other external forces; and X represents the movements of the six degrees of freedom of each body. In the specific case where the bodies are fixed, only refracted waves are calculated and the linear system given in Eq. (36) is to be solved just once per time step. However, when solving the body dynamics along with the wave problem requires an iterative procedure since the interaction between the waves and the movements of the bodies exists, leading to wave radiation. In order to solve Eq. (40), we use an implicit Newmark’s average acceleration method:
Xntt+1 = M −1 Fn+1 − K Xntt+1 Xnt +1 = Xnt + t
1 2
Xntt +
Xn+1 = Xn + tXnt +
1 2
Xntt+1
t 2 1 2
2
(41)
Xntt +
(42) 1 2
Xntt+1
(43)
Within each time step the systems of Eqs. (41)–(43) are solved iteratively along with Eq. (36). This requires predicting the bodies velocities by solving Eq. (42), introducing this velocities into Eq. (37), solving the linear system in Eq. (36), introducing the pressure forces into Eq. (40), and repeating the process until convergence is reached. In order to accelerate convergence, we used the secant method for predicting the bodies’ velocities and positions. The algorithm implemented also allows considering nonlinear external forces acting on the bodies such as mooring forces. They are implemented so that they are evaluated and added to the right-hand side of Eq. (40) at each iteration of the solver. 4. Solver acceleration Most of the computational effort of the proposed numerical algorithm is spent in solving the linear system given in Eq. (36). Therefore, in order to reduce the computational time, we will focus on speeding up the linear solver. To do so, two techniques, that can be used combined, have been analyzed: the use of deflated solver, and the use of parallel computing in graphic processing units. 4.1. Deflated conjugate gradient Deflation works by considering piecewise constant approximations on coarse sub-domains. These piecewise constant approximations are associated to the low frequencies eigenmodes of the solution, which are also the slow convergence modes [22,23]. Then, these slow modes can be quickly approximated by solving a smaller linear system of equations that can be incorporated to the traditional preconditioned iterative solvers in order to accelerate the convergence of the solution. Hence, the use of preconditioning technique is compatible with the use of preconditioners. The first step is to divide the domain into a set of coarse sub-domains capable of capturing the slow modes. Several techniques have been developed for this purpose [22,23]. Some authors have proposed a technique based on using seedpoints to start building the sub-domains, and sub-domains are created by associating nodes to their closer seeds [22]. This technique has the disadvantage of requiring prescribing the seeds. The algorithm developed in this work is based on level structure criteria, where nodes are associated to a central node based on the level structure rooted at this central node (concept from graph theory). A maximum level structure “L”, where L refers to the neighboring level respect to the central node, is prescribed. Fig. 1 shows a domain decomposition obtained using this level structure technique with level L = 2. Ordered pair (sub-domain, level) is used to identify nodes and sub-domains they belong to, and their corresponding structure level. Notice that nodes will accept to belong to a new
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
389
Fig. 1. Sub-domain decomposition using the neighboring level algorithm.
sub-domain, if and only if its structure level in that sub-domain is lower than its level in its current sub-domain. The algorithm used to obtain such decomposition is presented step by step next: Step 0: Assigned level L + 1 to all nodes. Step 1: Start building sub-domain 0: offer level 0 to the first node of the mesh (root node for sub-domain 0). It will become node (sub-domain, level) = (0, 0). Step 2: Identify neighbors of the root node (node (0, 0)) and offer them level 1: nodes (0, 1). Step 3: Identify neighbors of nodes (0, 1), and offer them level 2: nodes (0, 2). Step 4: The procedure is repeated until the prescribed level “L” is reached. Step 5: The first node still with level L + 1 (based on the numbering of the nodes) is used as root node for sub-domain 1, and it is given level 0, becoming the root node (1, 0). Step 6: Identify neighbors of the root node (1, 0) and offer them level 1: nodes (1, 1). (Notice that some (0, L) nodes might become (1, 1).) Step 7: Identify neighbors of nodes (1, 1), and offer them level 2: nodes (1, 2). (Notice that some (0, L) or (0, L − 1) nodes might become (1, 2) if L > 2 or L − 1 > 2, respectively.) Step 8: The procedure is repeated until the nodes (1, L) are identified. Step 9: Repeat Steps 5–8 until there is no node with level L + 1. This algorithm guarantees that any node cannot have a lower structure level respect to any other root node. And the higher the prescribed level, the lower the number of sub-domains created. Although the previous algorithm provides good results, it can be improved by using it twice. In a first round, rather than using the prescribed level L, a structure level 2L is used instead. Then a first decomposition is found with a lower number of sub-domains and twice the maximum structure level. In a second round, the procedure is repeated with a maximum structure level L and using as root those nodes with higher levels. Fig. 2 shows the sub-domain decomposition obtained using the two-rounds technique. The first one shows the decomposition after the first round at level 2L, and the second shows the decomposition after the second round at level L starting of the first decomposition at level 2L. Notice that the level at the sub-domains boundaries is lower or equal than the prescribed level L, and always minimum respect to any central node (node of level zero). The recommended number of sub-domains might depend on the specific case under study. Usually in the order of hundred is a recommended value. In our specific case, the matrix of the linear system to be solved remains the same during the calculation, so that the sub-domain decompositions have to be carried out just once. The deflated system has to be solved every iteration of the solver. However, since in our case study the linear system of equation to be solved remains the same along the simulation, so does the deflated matrix. Then, the inverse of the deflated
390
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Fig. 2. Sub-domain decomposition using the two-rounds neighboring level algorithm. (a) Decomposition after first round. (b) Decomposition after second round.
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
391
matrix can be calculated just once in the first time step and be stored. Hence, the resolution of the deflated system in each iteration can be substituted by a matrix vector multiplication. 4.2. GPU acceleration The fast development of the videogame industry is leading to higher computational capabilities of GPUs. Hence, their use for heavy computations in computational fluid dynamics is becoming more common (see [26,27]). The implementation carried out in this work is based on the well-known CUDA, a parallel computing platform and programming model invented by NVIDIA. It is focused in speeding up the iterative solver using the functions provided by the CUBLAS and CUSPARSE libraries. Deflated and non-deflated preconditioned conjugate gradient (CG) algorithm has been implemented in CUDA, along with a sparse approximate inverse (SPAI) preconditioner [28] and incomplete LU decomposition (ILU) preconditioner. However, using the sparse lower and upper triangular solvers provided in the CUSPARSE library for ILU preconditioning resulted in poor performance. This poor performance can be expected since solving triangular system is not suitable for massive parallelization across tens of thousands threads [28], as required by GPUs in order to hide memory latency. Therefore we will omit the use of the ILU preconditioner when reporting GPU results. Since the system matrix remains the same along the computation, preconditioners are calculated just once and in the first time step. Then, computational time invested in calculating preconditioners is negligible when compared to the total time spent in the linear solver during the whole simulation. 5. Numerical results 5.1. Waves refracted by a vertical circular cylinder In order to prove the efficiency of the proposed numerical schemes, we will solve the problem of a monochromatic wave interacting with a fix bottom mounted circular cylinder. We have chosen this case study because there exists an analytical solution to this problem. This analytical solution was found by McCamy and Fuchs in 1954 [29]. The velocity potential for a monochromatic wave traveling along the O X axis in cylindrical coordinates is given by:
φ I = Re
i Ag cosh(k( H + z))
ω
cosh(kH )
εm J m (kr ) cos(mθ) exp(i ωt )
(44)
m0
where (r , θ, z) are the cylindrical coordinates; A is the wave amplitude; ω is the wave frequency; H is the water depth; J m are the Bessel functions of the first kind; and εm = 1 if m = 0, and εm = 2(−i )m if m > 0. The diffracted wave velocity potential is given by:
i Ag cosh(k( H + z))
S
φ = Re
ω
cosh(kH )
−εm
m0
(kR ) Jm (2)
H m (kR )
(2 )
Hm
(kr ) cos(mθ) exp(i ωt )
(45)
where ( ) denotes derivatives respect to the argument kR; R is the radius of the cylinder; and H m is the Hankel function of the second kind. Summing up Eqs. (44) and (45), we obtain the analytical solution to the waves diffracted by vertical circular cylinder problem: (2)
φ = Re
i Ag cosh(k( H + z))
ω
cosh(kH )
εm J m (kr ) −
m0
(kR ) Jm (2)
H m (kR )
(2 )
H m (kr ) cos(mθ) exp(i ωt )
(46)
The pressure value at any point is obtained by Eq. (38). We can further decompose the pressure into the hydrostatic pressure and the dynamic pressure P φ = −ρ ∂t φ 1 , which is given by:
P φ = Re A ρ g
cosh(k( H + z))
cosh(kH )
εm J m (kr ) −
m0
(kR ) Jm (2)
H m (kR )
(2 )
H m (kr ) cos(mθ) exp(i ωt )
(47)
The horizontal force induced over the cylinder is obtained by integration of P φ over the cylinder. This force is given by the following equation:
F x = Re 2i A R ρ g π
tanh(kH ) k
J 1 (kR ) −
J 1 (kR ) (2)
H 1 (kR )
(2 )
H1
(kR ) exp(i ωt )
(48)
Next we compare numerical results obtained by the analytical solution with numerical results obtained by our FEM implementation for the specific case of R = 1 m, H = 1 m, A = 0.1 m, L = 2 m. Using ρ = 1025 kg/m3 , g = 9.81 m/s2 and by mean of the dispersion relation for first-order waves, we obtain the frequency value ω = ( g π tanh(π ))1/2 = 5.5411 rad/s, and the wave period T = 1.1339 s.
392
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Fig. 3. Contour lines of free surface elevation at time = 15 s. Analytical: solid line; FEM: dot line.
Table 1 ISSC TLP particulars. XG (m) YG (m) ZG (m) Ixx/Mass (m2 ) Iyy/Mass (m2 ) Izz/Mass (m2 )
0 0 3 38.876 38.876 42.420
The simulation is carried out using a circular domain with a radius of 6 meters. The mesh size is set to 0.1 m in an inner volume within a radius of 2 meters. In the outer volume, the mesh size grows smoothly up to 0.5 meters. The mesh created consists of 35 914 nodes and 188 715 tetrahedral elements. Fig. 3 compares the contour lines of the free surface elevation at any time t = nT . Notice that the FEM solution mostly lies over the analytical solution. Fig. 4 compares the pressure distribution over the cylinder obtained by the analytical solution and the FEM solution. Both pressure distributions are shown using the same color scale, with a maximum value of 950 Pa and a minimum of −1700 Pa. Fig. 5 compares the force induced over the cylinder obtained by the analytical solution and FEM. It can be seen that the forces obtained in both ways are quite close to each other. 5.2. Freely floating tension leg platform (TLP) In this section we analyze the seakeeping behavior of a freely floating tension leg platform (TLP). The platform used is the ISSC TLP [30]. The mass is obtained internally to equal the displacement of the platform. The gravity center position and radii of inertia are provided in Table 1. Fig. 6 shows the mesh used for the present case study. The gravity value used is g = 9.80665 m/s2 , the water density used is ρ = 1025 kg/m3 , and water depth is assumed to be infinite. The hydrostatic tensor K is calculated by the corresponding boundary integrals. We are interested in how the body moves when excited by regular waves. In this specific case where regular waves are the only excitation, the TLP is expected to respond harmonically with the same frequency as those of the incident waves. Since the method developed solves the problem in the time domain, initial condition and initialization will be important to avoid long transient periods. To do so, the fluid and TLP are initialized at rest, and the incident waves amplitudes are increased slowly during an initialization period. Simulations were carried out for periods ranging between eight and fifteen seconds for three different wave headings. Fig. 7 compares the response amplitude operators (RAOs) obtained by the present FEM model and RAOs obtained by the well known program WAMIT [31], which is based on the BEM and solves in the frequency domain. A mesh convergence analysis was carried out with the FEM model to make sure convergence was reached. Results from WAMIT were obtained for the case described in [31], where no information on the mesh convergence analysis is provided. Anyhow, results show very good agreement between both methods.
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
393
Fig. 4. Pressure induced on the cylinder by the velocity potential at time = 15 s. Comparison between analytical and FEM results.
5.3. Seakeeping of a GVA 4000 semisubmersible platform Next we carry out comparison between experimental data [32] and our numerical regarding the seakeeping behavior of the GVA 4000 semisubmersible platform. The results to be compared are the heave and pitch response amplitude operators in heading seas, with a range of wave periods between 6 and 32 seconds. The platform displacement is 25 940 tonnes. The center of gravity is located 21.35 m above the keel, and the horizontal position corresponds to the geometric center of the platform. The radii of inertia are rxx = 30.40 m, r y y = 31.06 m, and rzz = 37.54 m. The geometry of the GVA 4000 can be found in [32]. Based on [32], the model tests were carried out with the surge movement constraint by the action of a pre-stressed spring whose mission is to keep in place the structure during testing. Moreover, this spring creates also a pitching moment. Therefore, the pitch movement will be influenced not just by the waves, but also by the surge movement and the spring. Since we have no data regarding the pre-stressed spring used by the model basin, we cannot include it in the simulation. Instead, simulations have been carried out in two cases: no surge and free surge. Fig. 8 compares the experimental results and numerical FEM results. The experimental results lay within the numerical results as expected.
394
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Fig. 5. Horizontal force induced over the cylinder. Analytical (solid line) and FEM (circle).
Fig. 6. ISSC TLP and free surface meshes.
5.4. Deflated solver In order to check the performance of deflated preconditioned conjugate gradient in our problems, simulations using the ISSC TLP platform have been carried out using five different meshes. For each case, different levels of deflation have been used, and they have been compared to a non-deflated case. Table 2 provides the particulars of each case. Solver deflation is not incompatible with the use of preconditioners. Therefore, comparisons are made using the deflated conjugate gradient along with an ILU preconditioner. In the case of the deflated solver, different structure levels have been used, resulting in different number of sub-domains. In all cases, the same processor has been used. Fig. 9 shows the reduction in iterations for each case as the number of sub-domains changes. Tables 3–7 show the average number of iterations and speed up respect to the non-deflated case. The speed up has been obtained as the ratio of CPU times taken by the linear solver in the non-deflated and deflated cases. It can be observed that the number of iterations decreases as the dimension of the deflated subspace increases. This reduction in the number of iterations when using an ILU preconditioner indicates that the proposed technique for building sub-domains is performing well.
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Fig. 7. Response amplitude operator for freely floating TLP. Circles: WAMIT; triangles: FEM.
395
396
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Fig. 8. Heave and pitch response amplitude operators for the GVA 4000. Dots: experimental data; solid and dashed line: FEM.
Table 2 Particular of each case.
Number of nodes Number of elements h (m) t (s)
Case 1
Case 2
Case 3
Case 4
Case 5
17 804 101 572 4 0.75
51 333 292 705 2 0.5
153 758 792 372 0.5 0.1
459 538 2 335 374 0.35 0.075
913 149 4 640 638 0.25 0.05
Fig. 9. Percentage of iterations versus ratio of number of nodes to number of sub-domains.
However, when using deflated solvers, a few operations are added in every solver iteration. Hence reducing iterations is not synonym of reducing CPU time. Based on the results obtained, it can be said that the larger the system, the larger the speed up that can be obtained, and also the larger the number of sub-domains must be. However, care must be taken because a wrong dimension of the deflated subspace might end up in increasing the CPU time. Moreover, for smaller cases, deflation might even not be capable of speeding up at all. Deflation has been proved to be quite effective in reducing the number of iterations, which means that the sub-domain decomposition is working well. However, in our case study, deflation is not so effective in reducing CPU time. Moreover,
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
397
Table 3 Comparative among deflated solvers and non-deflated. Case 1: Number of nodes = 17 804 Structure level
Number of sub-domains
Average number of iterations
Speed up
2 3 4 5 6 7
836 316 174 100 66 49
15.89 18.18 19.36 21.34 22.39 23.74
×0.556 ×0.963 ×0.989 ×0.894 ×0.871 ×0.829
28.75
×1
Non-deflated
Table 4 Comparative among deflated solvers and non-deflated. Case 2: Number of nodes = 51 333 Structure level
Number of sub-domains
Average number of iterations
Speed up
2 3 4 5 6 7 8 9 10 11 12
2419 963 506 299 201 146 107 73 59 42 30
17.34 20.23 22.00 24.15 25.20 26.17 27.77 29.04 28.84 30.36 32.27
×0.310 ×0.908 ×1.081 ×1.055 ×1.023 ×0.943 ×0.907 ×0.889 ×0.912 ×0.863 ×0.789
38.68
×1
Non-deflated
Table 5 Comparative among deflated solvers and non-deflated. Case 3: Number of nodes = 153 758 Structure level
Number of sub-domains
Average number of iterations
Speed up
4 5 6 7 8 9 10 11 12
2101 1153 675 405 243 154 96 60 45
20.82 22.82 24.71 28.59 33.8 37.5 42.06 49.03 46.83
×0.77 ×1.20 ×1.30 ×1.17 ×1.00 ×0.91 ×0.82 ×0.71 ×0.72
49.53
×1
Non-deflated
Table 6 Comparative among deflated solvers and non-deflated. Case 4: Number of nodes = 459 538 Structure level
Number of sub-domains
Average number of iterations
Speed up
6 7 8 9 10 11 12
1936 1128 641 374 232 137 82
25.46 25.58 32.66 39.38 42.86 48.90 52.76
×1.01 ×1.39 ×1.22 ×1.03 ×0.94 ×0.84 ×0.76
60.86
×1
Non-deflated
398
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Table 7 Comparative among deflated solvers and non-deflated. Case 5: Number of nodes = 913 149 Structure level
Number of sub-domains
Average number of iterations
Speed up
6 7 8 9 10 11 12
3837 2229 1271 717 413 246 147
26.70 29.95 31.60 39.55 44.55 48.50 56.55
×0.54 ×1.10 ×1.40 ×1.20 ×1.07 ×0.99 ×0.83
71.85
×1
Non-deflated
Table 8 Mesh properties and case study numerical parameters. Number of nodes Case 1 Case 2 Case 3
137 556 369 410 1 181 302
Number of elements 799 665 2 152 774 6 878 083
Characteristic element size (m)
Time step (s)
0 .2 0 .1 0.05
0.1010 0.0714 0.0505
depending on the level structure used, CPU time can be even degraded respect to the non-deflated solver. Therefore, choosing the right level structure is mandatory to optimize the performance of the deflation algorithm. 5.5. Multi-body simulation In this section we will simulate an array of sixteen freely floating bodies in the presence of a regular wave. The purpose of this simulation is to show that our approach can handle multi-body simulations in a reasonable time. Let’s recall that when solving in the frequency domain, it is required to compute how each body interacts with all the others. Therefore, the number of interactions will grow with the number of bodies squared, making the computational effort to grow very quickly as the number of bodies increases. The case study consists of simulating the dynamics of an array of sixteen freely floating cylinders in the presence of a regular wave. Each cylinder is one meter in radius, and half meter in draft. Each one is freely floating without restriction. Then, since they are freely floating, the mass of each cylinder must equal the mass of the displaced water. Radii of gyration respect to their own center of gravity are equal to one. The incident wave has wave period of two seconds, an amplitude of ten centimeters, and an incident direction of 22.5◦ . The cylinders are placed on a regular pattern and the distance between the centers of adjacent cylinders is three meters. A fifty second simulation is carried out, where the first twenty-five seconds correspond to an initialization period where the amplitude of the incident wave increases smoothly up to ten centimeters. The absorption area starts at a distance of 12 meters from the origin of coordinates. The time step used in each case has been calculated such as the dimensionless grid number g t 2 /h = 0.5, where h represents the characteristic element size. The case study is simulated with three levels of grid refinement. Computational time spent to solve the linear system of equations in every time step during the simulations was measured and is reported in the next section. The solver used is a non-deflated preconditioned conjugate gradient. Table 8 provides the particulars of the meshes and numerical parameters used, and Fig. 10 shows the three different meshes used in this study. Fig. 11 shows the free surface elevation at the end of the simulation. Fig. 12 compares the movements of the cylinder located in the upper right corner for the three meshes used. While some difference is found for the coarser mesh, results are very close for cases 2 and 3 as expected. 5.6. GPU solver acceleration We use the case study described in Section 5.5 to carry out speed-up analyses when using different types of GPU/CPU architecture, as well as different types of preconditioners. When computing in parallel in the GPU, the sparse approximate inverse (SPAI) preconditioner will be used. When carrying out serial computation in the CPU, the SPAI and ILU preconditioners will be used. Let’s remind that the matrix system does not change along the simulation. Therefore, the preconditioner is only requested to be computed once. Also, let’s remind that the use of the ILU preconditioner in GPU architectures is not recommended due to its poor parallelization across thousands of threads as required by GPUs to hide memory latency. In order to compare computational performance, we carried out simulation in four different platforms: two GPU platforms, and two CPU platforms. Table 9 provides the particulars of each platform used. Since hardware evolve quite fast, it is necessary to take into account whether comparison is made between the same generation hardware or not. We have named GPU1 and CPU1 to the older platforms, belonging more or less to the same generation, and named CPU2 and GPU2 to the newer platforms, also belonging to the same generation. Therefore, when
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Fig. 10. Multi-body simulation. (a) Case 1 mesh; (b) case 2 mesh; (c) case 3 mesh.
Fig. 11. Multi-body simulation: free surface elevation at time = 49.8 seconds.
399
400
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
Fig. 12. Multi-body simulation: movements of the body located at the upper right corner. Comparison using different levels of mesh refinement.
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
401
Table 9 Computing platforms used. Platform
Description
GPU1 CPU1 GPU2 CPU2
NVIDIA GTX 280 Intel (R) Core (TM) 2 CPU Q9400 @ 2.66 GHz 2.67 GHz NVIDIA TESLA C2075 Intel (R) Core (TM) i7-3930K CPU @ 3.20 GHz 3.20 GHz
Table 10 Computational time spent for a fifty second simulation of an array sixteen floating cylinders. Case 1
Platform GPU1 CPU1 GPU2 CPU2
Case 2
GPU1 CPU1 GPU2 CPU2
Case 3
GPU1 CPU1 GPU2 CPU2
Preconditioner SPAI ILU SPAI ILU SPAI ILU SPAI ILU SPAI ILU SPAI ILU
Solver time (s) 564 3256 2868 282 1217 1123 2012 15 051 13 549 1060 6052 5198 8519 85 605 75 223 4743 37 479 30 110
Total time (s) 695 3325 2938 314 1249 1154 2284 15 312 13 812 1178 6165 5311 9854 86 740 76 307 5302 37 941 30 588
Solver time (%) 81.2 97.9 97.6 89.7 97.5 97.3 88.1 98.3 98.1 90.0 98.2 97.9 86.4 98.7 98.6 89.4 98.8 98.4
Table 11 Speed up obtained by using GPU1 respect to serial execution with CPU1. Reference case CPU1 + SPAI GPU1 + SPAI (speed up)
CPU1 + ILU
Case 1
Case 2
Case 3
Case 1
Case 2
Case 3
×5.77
×7.48
×10.04
×5.09
×6.73
×8.83
Table 12 Speed up obtained by using GPU2 respect to serial execution with CPU2. Reference case CPU2 + SPAI GPU2 + SPAI (speed up)
CPU2 + ILU
Case 1
Case 2
Case 3
Case 1
Case 2
Case 3
×5.32
×5.71
×7.90
×3.98
×4.90
×6.35
comparing computational time, it must be taken into account that the newer platforms were bought in 2012, while the older ones were bought in 2009. Table 10 shows the computational time required for simulating fifty seconds with three different meshes and under different solver strategies. Notice that more than eighty percent of the computational time is spent in the linear solver. Therefore, speeding up this part of the code is the key point for acceleration. Tables 11 and 12 show the speed up obtained as the ratio of time spent in the linear solver when using the same generation CPU and GPU platforms. When comparing GPU1 versus CPU1, the minimum speed up obtained in the linear solver is ×5.09, ×6.73, and ×8.83 for case 1, 2 and 3, respectively. On the other hand, when comparing GPU2 versus CPU2, the minimum speed up obtained in the linear solver is ×3.98, ×4.90, and ×6.35 for case 1, 2, and 3, respectively. As expected, the speed up increases as the size of the case study increases. When comparing GPU1 versus GPU2, it can be observed that the speed up obtained using the newer generation is in the order of 2. We should also point out that the price of the newer is in the order of ten times more expensive. Comparing the performance of GPU1 respect to CPU2, we observe that GPU1 performs better. It gets speed ups of ×2, ×2.58, and ×3.53 in each case study. The fifty second simulation can be carried out with enough accuracy in some 20 minutes using GPU2. Since the length scale of the model is one meter, those fifty seconds are equivalent to 500 seconds if the radius of each cylinder were one hundred meters. This leads to a ratio between computational time and real time around 2 for offshore engineering problems.
402
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
6. Conclusions A FEM solver for wave–structure interaction in the time domain and on unstructured meshes has been presented. The wave modeling is based on Stokes’ perturbation theory, which allows keeping the same computational domain along the simulation. In this work, a fourth-order compact Padé scheme has been successfully used for solving the free surface boundary condition. Moreover, it allows imposing the free surface boundary condition as a Neumann type, leading to a strong coupling with the Laplacian matrix which avoids the need of iterating within each time step. FEM results have been compared to analytical results available for the problem of wave diffracted by a circular vertical cylinder. The agreement between both solutions shows that the algorithms developed in this work perform well. Response amplitude operators for a freely floating TLP structure obtained by the present FEM and BEM also compare well. Moreover, comparison of response amplitude operators obtained experimentally for a semisubmersible platform shows that the present FEM code is capable of providing accurate results in real problems. Deflation has been proved to be quite effective in reducing the number of iterations required by the conjugate gradient. However, in our case study, deflation is not so effective in reducing CPU time as it is in reducing solver iterations. Solver deflation can speed up the solution and, the larger the size of the problem, the larger the speed up. But, depending on the level structure used, CPU time can be even degraded respect to the non-deflated solver. Therefore, choosing the right level structure based on the size of the problem is mandatory to optimize the performance of the deflated conjugate gradient. GPUs have been used for solver acceleration. For this purpose, deflated and non-deflated preconditioned conjugate gradient (CG) algorithm has been implemented in CUDA, along with a sparse approximate inverse (SPAI) preconditioner and incomplete LU decomposition (ILU) preconditioner. Results indicate that GPUs-based solvers, using SPAI preconditioners, can perform faster than CPUs, and the speed ups obtained in this work are in the range of 4–9, depending on the size of the problem, and the generation of GPU/CPU used. Multi-body simulation has been carried out using sixteen freely floating cylinders. It has been shown that the computational time allows simulating multi-body ocean engineering problems closed to real time. Therefore, this sort of approach could be used for operational purposes in the near future. Acknowledgements This study was partially supported by the Ministry for Science and Innovation of Spain in the AIDMAR project CTM2008-06565-C03-01, and the Office of Naval Research Global (USA) by the Navy Grant N62909-10-1-7053. The authors also want to acknowledge Prof. Clauss and Dr. Schmittner, for providing the main characteristics of the GVA 4000 model used in the experimental tests. References [1] X. Cai, H.P. Langtangen, B.F. Nielsen, A. Tveito, A finite element method for fully nonlinear water waves, J. Comput. Phys. 143 (1998) 544–568. [2] E. Oñate, J. García, A finite element method for fluid–structure interaction with surface waves using a finite calculus formulation, Comput. Methods Appl. Mech. Eng. 191 (2001) 635–660. [3] R. Löhner, C. Yang, E. Oñate, On the simulation of flows with violent free surface motion, Comput. Methods Appl. Mech. Eng. 195 (2006) 5597–5620. [4] R. Löhner, C. Yang, E. Oñate, On the simulation of flows with violent free surface motion and moving objects using unstructured meshes, Comput. Methods Appl. Mech. Eng. 53 (2007) 1315–1338. [5] J. García, A. Valls, E. Oñate, ODDLS: A new unstructured mesh finite element method for the analysis of free surface flow problems, Int. J. Numer. Methods Fluids 76 (9) (2008) 1297–1327. [6] G.G. Stokes, On the theory of oscillatory waves, Trans. Camb. Philos. Soc. 8 (1847) 441–455. [7] G.X. Wu, R. Eatock Taylor, Finite element analysis of two-dimensional non-linear transient water waves, Appl. Ocean Res. 16 (1994) 363–372. [8] G.X. Wu, R. Eatock Taylor, Time stepping solution of the two-dimensional non-linear wave radiation problem, Ocean Eng. 22 (1995) 785–798. [9] D.M. Greaves, A.G.L. Borthwick, G.X. Wu, R. Eatock Taylor, A moving boundary finite element method for fully nonlinear wave simulation, J. Ship Res. 41 (1997) 181–194. [10] I. Robertson, S. Sherwin, Free-surface flow simulation using hp/spectral elements, J. Comput. Phys. 155 (1999) 26–53. [11] Q.W. Ma, G.X. Wu, R. Eatock Taylor, Finite element simulation of fully nonlinear interaction between vertical cylinders and steep waves – part 1: methodology and numerical procedure, Int. J. Numer. Methods Fluids 36 (2001) 265–285. [12] Q.W. Ma, G.X. Wu, R. Eatock Taylor, Finite element simulation of fully nonlinear interaction between vertical cylinders and steep waves – part 2: numerical results and validation, Int. J. Numer. Methods Fluids 36 (2001) 287–308. [13] J.H. Westhuis, The numerical simulation of nonlinear waves in the hydrodynamic model test basin, Ph.D. thesis, Universiteit Twente, The Netherlands, 2001. [14] P.X. Hu, G.X. Wu, Q.W. Ma, Numerical simulation of nonlinear wave radiation by a moving vertical cylinder, Ocean Eng. 29 (2002) 1733–1750. [15] M.S. Turnbull, A.G.L. Borthwick, R. Eatock Taylor, Wave–structure intersection using coupled structured–unstructured finite element meshes, Appl. Ocean Res. 25 (2003) 63–77. [16] G.X. Wu, Z.Z. Hu, Simulation of nonlinear interactions between waves and floating bodies through a finite element based numerical tank, Proc. R. Soc. Lond. Ser. A 460 (2004) 2797–2817. [17] C.Z. Wang, B.C. Khoo, Finite element analysis of two-dimensional nonlinear sloshing problems in random excitations, Ocean Eng. 32 (2005) 107–133. [18] C.Z. Wang, G.X. Wu, Time domain analysis of second order wave diffraction by an array of vertical cylinders, J. Fluids Struct. 23 (4) (2007) 605–631. [19] C.Z. Wang, G.X. Wu, Analysis of second-order resonance in wave interactions with floating bodies through a finite-element method, Ocean Eng. 35 (2008) 717–726. [20] S. Yan, Q.W. Ma, QALE-FEM for modelling 3D overturning waves, Int. J. Numer. Methods Fluids (2009), http://dx.doi.org/10.1002/fld.2100. [21] H. Song, L. Tao, S. Chakrabarti, Modelling of water wave interaction with multiple cylinders of arbitrary shape, J. Comput. Phys. 229 (2010) 1498–1513.
B. Serván-Camas, J. García-Espinosa / Journal of Computational Physics 252 (2013) 382–403
403
[22] R. Aubry, F. Mut, R. Löhner, J.R. Cebral, Deflated preconditioned conjugate gradient solvers for the Pressure–Poisson equation, J. Comput. Phys. 227 (2008) 10196–10208. [23] F. Mut, R. Aubry, G. Houzeaux, J. Cebral, R. Löhner, Deflated preconditioned conjugate gradient solvers: extensions and improvements, in: 48th Aerospace Sciences Meeting and Exhibit, Orlando, FL, January 2010. [24] O.C. Zienkiewicz, R.L. Taylor, J.Z. Zhu, The Finite Element Method: Its Basis and Fundamentals, Butterworth–Heinemann, ISBN 0750663200, 2005. [25] S.K. Lele, Compact finite difference schemes with spectral-like resolution, J. Comput. Phys. 103 (1992) 16–42. [26] F. Mossaiby, R. Rossi, P. Dadvand, S. Idelsohn, Open CL-based implementation of an unstructured edge-based finite element convection–diffusion solver on graphics hardware, Int. J. Numer. Methods Eng. 89 (13) (2011) 1635–1651. [27] N. Bell, M. Garland, Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical Report NVR-2008-004, December 2008. [28] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing Company, Boston, 1996. [29] R. McCamy, R. Fuchs, Wave forces on piles: a diffraction theory, Tech. Memo No. 69, U.S. Army Corps of Engrs., 1954. [30] R. Eatock Taylor, E.R. Jefferys, Variability of hydrodynamic load predictions for a tension leg platform, Ocean Eng. 13 (5) (1986) 449–490. [31] WAMIT User Manual, http://www.wamit.com/manual.htm. [32] G.F. Clauss, C. Schmittner, K. Stutz, Time-domain investigation of a semisubmersible in rouge waves, in: 21st International Conference on Offshore Mechanics and Arctic Engineering, Oslo, Norway, June 23–28, 2002, OMAE2002-28450.