ARTICLE IN PRESS Engineering Analysis with Boundary Elements 34 (2010) 673–679
Contents lists available at ScienceDirect
Engineering Analysis with Boundary Elements journal homepage: www.elsevier.com/locate/enganabound
A fast wavelet-multipole method for direct BEM Jinyou Xiao a,, Johannes Tausch b a b
College of Astronautics, Northwestern Polytechnical University, Xi’an 710072, China Department of Mathematics, Southern Methodist University, Dallas, TX 75275, USA
a r t i c l e in fo
abstract
Article history: Received 20 August 2009 Accepted 17 January 2010 Available online 21 March 2010
A fast wavelet-multipole method (WMM) has been developed to achieve further speedup for the boundary element method in solving the direct boundary integral equations. The main idea is to compute the right-hand-side vector by the fast multipole method and to solve the linear system by the wavelet compression method. By using the variable order moments, almost linear complexity can be obtained. The primary advantages of the present WMM lie in that it (1) permits efficient implementation; (2) is universal in handling practical problems with complicated geometries. Numerical examples with around 1 million unknowns, performed on nontrivial geometries, clearly show that the WMM can shorten the total computational time by reducing the time for computing the right-hand-side. & 2010 Elsevier Ltd. All rights reserved.
Keywords: Wavelet compression method Fast multipole method Direct boundary integral equation Boundary element method
1. Introduction There are indirect and direct formulations of the boundary integral equation (BIE) that describe a physical problems equivalently. The direct BIE has gained popularity among engineers and scientists, since the unknowns are the same as the real physical variables that govern the equations. By using the boundary element method (BEM) the direct BIE can be discretized into a linear system of equations. The coefficient matrix is fully populated, and the right-hand-side (RHS) vector is obtained by a matrix–vector product with a dense matrix. Therefore, the total computational complexity for the numerical treatment of the direct BIE by the conventional BEM is at least OðN2 Þ, where N is the number of unknowns. The quadratic complexity can be reduced to almost linear, i.e. a OðNlog NÞa Z 0 , by using the fast boundary element techniques emerged in the past two decades. Representative examples of such fast techniques are fast multipole method (FMM) [1,2,14] and wavelet compression method (WCM) [7–10,15]. Other techniques include Hmatrices [3,4], adaptive cross approximation [13], precorrected-FFT method [5,6], etc. Here we note that Hmatrices is of similar idea with the FMM. Thus the following discussion about the FMM also applies to the Hmatrices. The essential of the aforementioned acceleration techniques is to use the iterative solver and to perform matrix–vector products approximately so that the complexity of one iteration can be reduced to almost linear. Even so, the overall efficiency of those techniques are generally different. We will focus on the FMM and
Corresponding author.
E-mail addresses:
[email protected] (J. Xiao),
[email protected] (J. Tausch). 0955-7997/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.enganabound.2010.01.009
the WCM. Comparisons of the two methods can be found in [17], where we show that, although the setup of the FMM is faster than of the WCM, the matrix–vector application in the WCM is much more efficient than in the FMM. Consequently, we conclude that, from the viewpoint of saving the computational time, the FMM is more suitable for problems with small number of iterations, while the WCM is more efficient for problems with large number of iterations. This conclusion motivates us to exploit the combination of the two methods in order to achieve further speed-up in solving direct BIEs by BEM. Our idea is to compute the RHS vector by the FMM and to solve the resulting linear system by the WCM, which is referred to as wavelet-multipole method (WMM). Since in many practical problems relatively large numbers of iterations are necessary to get acceptable solutions, we can expect the WMM to reduce the total computational time. Theoretically, a WMM can be obtained by combining any one version of FMM and one WCM; for example, we can form a WMM by the FMM in [2] and the WCM in [9]. However, this is not a good combination from the viewpoint of simplicity and efficiency, because the principles of the two method are generally different and much extra work has to be done in setting up the WMM. Moreover, the WCM in [9] and its kin are not suitable for problems with very complex geometries due to the need of parametric surfaces. Thus, the key to form an efficient WMM is to use FMM and WCM which are simple in implementation and more universal in handling realistic problems. We propose to use the FMM in [14] and the WCM in [11]. Both methods use Taylor expansion of the Green’s function and can achieve almost optimal complexity by using the variable order moments. In addition, they can be transformed from each other, which is detailed in [17]. It is worthy noting that a WMM can also be constructed using the WCM in [12] and the FMM in [2], and it
ARTICLE IN PRESS 674
J. Xiao, J. Tausch / Engineering Analysis with Boundary Elements 34 (2010) 673–679
may have similar complexity with the WMM in this paper. However, its implementation is more complicated. To our knowledge, the WMM is the first reported work on the combination of the WCM and the FMM in order to further promote the efficiency of the BEM. We expect this to be the main contribution of this paper. For simplicity in presentation, our discussion is focused on the direct BIE of Laplace equation, but the method can be applied to solve direct BIEs in other problems including Helmholtz, elasticity and Stokes flow, probably with minor modifications. We organize the paper as follows. The model problem as well as the main idea of the WMM are presented in Section 2. The basic settings of the FMM and the WCM are described in Sections 3 and 4, respectively. Numerical results are provided in Section 5 to test the performance of the WMM. Section 6 concludes the paper.
The motivation of the WMM comes from work [17] which shows that the WCM can be transformed into a FMM. Following this observation, the efficiency of the WCM and the FMM was compared with many large-scale practical examples, see [17–19]. We found that:
The CUP time requirement of FMM is lower than of the WCM
2. WMM for direct BEM
only when the number of iterations for solving the linear system is small (for example, less than 50), because in WCM most computational time is spent on computing the sparse matrix, once it is obtained and stored, the iteration is very fast. In FMM, however, one has to perform large amount of re-computations in order to reduce the memory requirement, thus the time consumption of an iteration in FMM is much longer than in WCM; see Table 2 in [17]. By using the a-posteriori compression, the memory requirement of the (sparse) system matrix of the WCM can be lower than the near-field matrix of the FMM.
2.1. Direct BEM As a model equation (BIE) Z CBIE ðxÞuðxÞ ¼
problem we consider the direct boundary integral of the three-dimensional Laplace equation @ Gðx;yÞqðyÞ þ Gðx; yÞuðyÞ dGy ; x A G: ð1Þ @ny G
where CBIE is a constant whose value is dependent on the geometry of the position of the source point x ¼ ðx1 ; x2 ; x3 Þ; u and q are the temperature and heat flux, respectively; ny is the outward normal at the field point y and Gðx; yÞ the Green’s function Gðx;yÞ ¼ GðrÞ ¼
1 ; 4pr
SN
@Gðx;yÞ 1 @r ¼ ; @ny 4pr 2 @ny
r ¼ jxyj:
Let T h ¼ i ¼ 1 ti be a quasi uniform triangulation of G, where h ¼ maxN i ¼ 1 diamðti Þ. We can define the piecewise constant boundary element space Xh ¼ spanfwi : i ¼ 1; 2; . . . ; Ng; ð2Þ pffiffiffiffiffiffiffi in which wi ¼ 1= jti j on panel ti and vanishes elsewhere. Using the standard Galerkin method based on Xh, BIE (1) can be discretized as " # " # q1 u1 ¼ ½G1 G2 : ð3Þ ½H1 H2 q2 u2 The subscripts in (3) are used to indicate different boundary conditions. Here we simply suppose u2 and q1 are prescribed, then the linear system for the unknowns is Aa ¼ b;
ð4Þ
where "
A ¼ ½H1 G2 ;
# q1 ; a¼ u2
"
# q1 b¼B ; u2
B ¼ ½G1 H2 :
ð5Þ
In the conventional BEM, the coefficient matrix A is densely populated and the complexity for solving (4) is at least OðN2 Þ. Moreover, the complexity for computing the known right-hand side vector b also scales like OðN 2 Þ. 2.2. Wavelet-multipole method The basic idea of the wavelet-multipole method (WMM) for direct BIE (1) is to compute the right-hand side vector b using the FMM while solving the linear system (4) using the WCM. The aim is to achieve further speed-up in solving large-scale problems using BEM.
Above comparisons indicate that the WCM is more suitable for problems which need relatively large number of iterations to converge, whereas the FMM is more efficient for small number of iterations. Now we return to (4). We note that only one iteration is needed to compute vector b, thus the FMM is a better choice. On the other hand, in many real-life problems relatively large number of iterations is required to get a reasonable solution a, so we propose to solve (4) by the WCM. It is clear that the WMM has its advantage only when a certain number of iterations is required to solve (4). Our experience shows that this number is in most case around 50, see e.g., [17]. For problems requiring less than 50 iterations, a full FMM scheme in which not only the vector b is computed but the system (4) is also solved by FMM, may outperform the WMM. However, in most large-scale practical problems with complicated geometries and boundary conditions, usually large numbers (larger than 50) of iterations are necessary, thus we can expect the WMM to achieve further speed-up in solving direct BIEs; see numerical experiments in Section 5. In the following, we will consider the implementation of the WMM. Efficiency and simplicity are two factors we concern most. We will establish a WMM using the FMM in [14] and the WCM in [11], because: (1) the two methods have many common points, for example, they use the same tree structure, compute the moment matrices of cubes and use the same multipole-to-local Table 1 Computing times (s), memory usage (MB) and L2-error in Example 5.1. N
Trhs
Tmatrix
Tit
Ttotal
Nit
Memory
L2-error
648 2592 10,368 41,472 165,888 663,552
0 1 4 18 72 307
0 3 21 134 722 3278
0 0 0.02 0.12 0.60 2.33
0 4 25 154 806 3665
8 9 5 5 3 3
1.8 12.7 59.1 248.2 1031.6 4515.3
0.021143 0.008453 0.001125 0.000657 0.000285 0.000149
Table 2 Computing times (s), memory usage (MB) and L2 error in example 5.2. N
Trhs
Tmatrix
Tit
Ttotal
Nit
Memory
L2-error
11,788 47,152 188,608 754,432
4 20 79 316
6 44 188 710
0.01 0.06 0.24 1.12
11 74 319 1325
106 142 194 250
35.2 141.7 567.1 2303.9
0.00735 0.00240 0.00170 0.00128
ARTICLE IN PRESS J. Xiao, J. Tausch / Engineering Analysis with Boundary Elements 34 (2010) 673–679
transformations, this permits an efficient implementation of the WMM; (2) they are based on Taylor expansion of Green’s functions, which allows a unified treatment of a large class of Green’s functions associated with elliptic boundary value problems.
a
coefficient ln;n0 is given by
lan;n0 ¼
3.1. Tree structure and notations The construction of the multilevel tree begins with embedding T h into a level-0 cube C0 . The cube is subdivided into eight cubes of equal size and these cubes are repeatedly refined until the finest cubes contain at most a predetermined number of panels. The set of cubes that have a non-empty intersection with T h at level-l is denoted by Cl . The collection of all the non-empty cubes S is C ¼ Ll ¼ 0 Cl . The collection of panels whose centroid is in cube n is Gn . The bounding box Bn of Gn is the smallest axiparallel box that contains Gn . xn is the center of Bn . The nonempty children of cube n are denoted by sonsðnÞ. The separation ratio of two cubes n and n0 at the same level is defined as 8 jyy0 j > n a n0 ; < miny0 A Gn0 maxx A Gn ; jxy0 j Zn;n0 ¼ y A Gn > : n ¼ n0 : 1 0
The neighbor set for any cube n A C is defined as maxfZn;n0 ; Zn0 ;n g 4 Zg;
ð6Þ
3.2. Basic FMM formulas In direct BEM the RHS b is obtained by a matrix–vector multiplication, see (5). This can be accelerated by the FMM. The key idea is to approximate the Green’s function in the interaction of well-separated cubes by a truncated Taylor series expansion. In the variable order scheme, the expansion order of the FMM is adjusted to the level. Here we let pl ¼ Ll þpL ;
ð7Þ
where pL is the order at the finest level L. Vector b is concerned with the computation of integrals of Green’s function as well as integrals of its normal derivative. For two well-separated cubes n and n0 at level l, these integrals can be approximated as Z g @ny GðjxyjÞ gðyÞ dGy X
X
Da þ b Gðjrn;n0 jÞ ðxxn Þa a!b! jaj r pl jbj r pl jaj X a ¼ ln;n0 ðxxn Þa ; xA Gn
Z
g
Gn0
jaj rpl ;
ð9Þ
where mbn0 ðgÞ is a moment of the function g, which is given by Z g mbn0 ðgÞ ¼ @ny ðyxn0 Þb gðyÞ dGy : ð10Þ Thus there are two types of moments, depending on whether g ¼ 0 or g ¼ 1. Since the translation operators for both types are identical, our notations will ignore this distinction. The computation of the local expansion coefficients from the moments in (9) is the MtL translation. The moments of a cube n depend on the center xn . In order to recursively compute the moments of a cube from the moments of its children, it is necessary to translate the center. This operation is the MtM translation. Its coefficients are derived easily from the multivariate binomial formula ! X X a ðxn0 xn Þab mbn0 ðgÞ; jajr pl ; l ¼ levelðnÞ: man ðgÞ ¼ n0 A sonsðnÞb r a
@ny ðxn0 yÞb gðyÞ dGy ð8Þ
jaj r pl
where rn;n0 ¼ xn xn0 ; a; b are multiindices and g ¼ 0 for the single and g ¼ 1 for the double layer operator. The local expansion
b
ð11Þ Note that in (11) the moments up to order pl of the son cubes are needed, however for those cubes only the moments up to order pl þ 1 r pl are computed before. Here we assume all higher order moments to vanish. It is shown in [14] that this do not amplify the error. The local expansion coefficients of a cube n0 are computed from the expansion coefficients of the cube’s parent n by translating the expansion center ! X a X aX a a ðxn0 xn Þab ðxxn0 Þb ln ðxxn Þ ¼ ln jaj r p
where levelðnÞ denotes the level of n; the parameter Z is predetermined. Two cubes which are not neighbors are said to be well-separated.
Gn0
Da þ b Gðjrn;n0 jÞ ð1Þb mbn0 ðgÞ; a!b! jbj r p jaj
Gn 0
The variable order FMM is recalled. This is the base of the WCM in Section 4. Most of the formulas in this section can be found in [14].
and
X l
3. FMM with variable order moments
N ðnÞ ¼ fn0 A C : levelðnÞ ¼ levelðn0 Þ
675
jaj r pl
0
bra
X BX ¼ @ jbj r pl
¼
X
aZb jaj r p
b
a b
!
1
aC ðxn0 xn Þab ln Aðxxn0 Þb
lbn0 ðxxn0 Þb :
ð12Þ
jbj r pl
The above operation is the LtL translation. With the above translation relations at hand, the matrix– vector multiplication in b can be readily performed by the standard procedures of FMM; see [14] for the details. 4. WCM with variable order wavelets It is well-known that the BEM using conventional nodal basis generates full matrix. The core of the WCM is to transform nodal basis into an equivalent wavelet basis, denoted by C, in which most functions have vanishing or quasi-vanishing moments. BEM using C leads to numerically sparse matrix. 4.1. Wavelet construction The wavelet functions can be transformed from the nodal functions by a sequence of local transformations within the tree structure established in Section 3.1. This process is described in detail in, e.g. [11,12,16]. Here we only give a brief summary to clarify the concept. In each finest-level cube, the nodal functions are transformed into finest-scale wavelets c and scaling functions f. In the coarser levels, the ffunctions of each cube’s children are transformed into c- and ffunctions of the cube. To understand this transformation, given a cube n, denote the ffunctions in the
ARTICLE IN PRESS 676
J. Xiao, J. Tausch / Engineering Analysis with Boundary Elements 34 (2010) 673–679
~ ...;f ~ . Their transformations f ^ ;...;f ^ is cube’s children by f 1 n 1 n given by a nonsingular n nmatrix Q and
f^ i ¼
n X
~ : ½Q j;i f j
ð13Þ
j¼1
~ ’s are supported in cube n, then the transformed Note that if the f j ^ functions f i are supported in n as well. The transformation matrix Q is at our disposition, but its choice has a large influence on the compressibility of the integral operator in the transformed basis. One approach to achieve high compression is to construct wavelets with vanishing moments up to a level-dependent order pl in (7). Note that the moments are defined by (10). The moment ^ Þ is the matrix with the coefficients ma ðf ^ Þ, where a is matrix Mn ðf i n the row and i the column index. Simple matrix algebra shows that ^ Þ ¼ Mn ðf ~ ÞQ . If we let Q be the matrix in the SVD Mn ð f ~ Þ ¼ URQ T , then the matrix Mn ðf ^ Þ has vanishing columns Mn ðf where the singular values vanish. Hence the corresponding transformed functions in (13) have the exact vanishing moment property. These functions are the cfunctions (denoted by cn;k ; k A I^ n ), the remaining functions are the ffunctions (denoted by fn;k ; kA I~ n ). To obtain more cfunctions and thus higher compression of the integral operator, it is possible to allow wavelets to have nonzero but sufficiently small moments, by including the columns corresponding to small singular values. We call these quasivanishing moment (QVM) wavelets [16]. An important parameter in this construction is the threshold esv of the singular values, which controls the residuals of the moments of the QVM-wavelets and the efficiency of the wavelet BEM. It can be given by
esv ¼ Cs 2l Zpl ;
ð14Þ
where Cs is a constant. In the above constructions, one needs to compute the moment matrix for each cube. This can be done in exactly the same way as in the FMM in Section 3.2; that is, in the finest level the moment of each cube is computed directly and in coarser levels it is transformed from the moments of the cube’s children by using MtM (11). 4.2. Matrix compression
compression is implemented according to the neighborhood defined in (6); more specifically, we only have to compute the wavelet interactions associated with neighboring cubes. Consequently, we obtain a sparse NS-form matrix, denoted by A~ ns , which is an approximation of Ans . The nonzero coefficients of A~ ns is OðNÞ. They can be computed in a fast manner (see [11,17] for the details) of which the computational cost is bounded by OðNÞ. Here we should note that, in calculating A~ ns , the MtL translation (9) in FMM is used to estimate the finteractions corresponding to well-separated cubes. The MtL translations consume much CUP time as they do in the FMM. Although A~ ns is sparse, it still contains a large number of coefficients with small values. In order to yield further compression, thus to further reduce the memory requirement, we propose to use the a-posteriori compression [19]. The key idea is to discard the coefficients of A~ ns below a level dependent threshold el , which can be given by
el ¼ Ce 2ðs2Þl Zpl þ s ;
ð15Þ
where, s is the order of singularity of the Green’s function; s ¼1 in this paper. Ce is a user-defined parameter. Let A^ ns be the approximate NS-form after the a-posteriori compression, and let ½A~ ns lij be the coefficient of A~ ns at level l. Then the compression strategy is given by ( ^ l ½A~ ns lij ½A ns ij Z el ; l ^ ð16Þ ½A ns ij ¼ ½A^ l o el : 0 ns ij
One should note that the a-posteriori compression does not always reduce the memory requirement. In the original WCM the non-zero blocks of the NS-form A~ ns is saved in dense matrix mode. To benefit from the a-posteriori compression, however, the non-zero blocks in A^ ns must be kept in sparse matrix mode which requires storing the positions in addition to the actual values of the non-zero coefficients. This additional memory overhead will probably cancel out the memory reduction of dropping the coefficients. It is easy to show that, to benefit from the aposteriori compression, one can only implement it to m n blocks in A~ ns satisfying nnz oCsp nm;
In FMM we do not compute and save the system matrix explicitly, instead we only use the block-wise low-rank decomposition of it in matrix–vector multiplication. However, in WCM since the matrix (e.g., A) can be compressed to a very sparse one a with only OðNlog NÞa Z 0 nonzero coefficients, we can afford to compute and save it. Thus the iteration in the WCM is more fast than that in the FMM. This is a main difference of the two methods. The matrix in wavelet basis is the wavelet transformation of the matrix A in nodal basis. It has two representations, namely the standard form and the non-standard form (NS-form). This paper considers the latter. For the details of the former see [7,9,10]. In the following we let Ans be the NS-form wavelet transformation of A. The typical structure of Ans can be found in [19]. The nonzero coefficients of the whole matrix Ans is OðN2 Þ. However, most of them have small values, and can be discarded without affecting the error of the result, which is known as the matrix compression. The compression scheme for the WCM based on NS-form is studied in [19]. Here it is summarized. The compression of Ans can be performed at two phases which are known as the a-priori compression and the a-posteriori compression. The former accounts mainly for the acceleration achieved by the WCM, while the latter can further reduce the memory requirement of the compressed matrix. The a-priori
ð17Þ
where, nnz is the number of non-zeros being retained after using (16); Csp ¼ Me =ðMe þ Mp Þ; Me and Mp are the memory requirements for storing a coefficient and its position, respectively. When we obtain the compressed matrix A^ ns , the linear system (4) can be solved using a iterative solver, e.g. the generalized minimal residual method (GMRES). The NS-form matrix–vector multiplication can be found in [17]. We close this section by noting that: (1) the a-priori compression in the WCM is equivalent to the low-rank approximation in the FMM, they have the same accuracy; (2) the aposteriori compression is unique to the WCM, by which the memory requirement of the NS-form matrix can be further reduced to be even lower than the near-field matrix of the FMM, as shown in [19].
5. Numerical experiments We demonstrate the performance of the WMM in solving direct BIEs in potential problems. All the computations are in core on an HP workstation with a Xeon 3.0 GHz CPU and 30 GB RAM. The resulting linear systems are solved by the GMRES.
ARTICLE IN PRESS J. Xiao, J. Tausch / Engineering Analysis with Boundary Elements 34 (2010) 673–679
677
Fig. 1. A crankshaft being discretized into 11,788 boundary elements.
5.1. A sphere model First, we test the accuracy of the WMM by solving Neumann problem of Laplace Eq. (1) on the surface of an unit sphere. Let uðxÞ ¼ x1 be the analytical solution of this problem, then the flux boundary condition can be applied accordingly. The surface is partitioned into 648 triangular elements and then refined 5 times. In the computations, we set the parameters pL ¼ 2; Z ¼ 0:4. We set the two parameters concerned with matrix compression to be Cs ¼ Ce ¼ 0:01. Tolerance for the iterative solver GMRES is 10 6. The computational times, total memory usage and the L2-error of potential u on boundary are listed in Table 1. Columns Trhs, Tmatrix, Tit and Ttotal are the time for computing the RHS using the FMM, time for computing the compressed matrix A^ ns , time for one iteration in the WCM and the total time for solving the problem; Nit stands for the number of iterations. The symbol ‘‘ 0’’ indicates the time is near 0s. It can be seen from the last column that the L2-error of potential u is of order OðhÞ. The memory usage and the computing times are of order OðNlogNÞ. In addition, one can see that the numbers of iterations Nit in this example are quite small, thus the FMM is a better choice to solve the direct BIE in this example (see Section 2.2 for the reason).
5.2. A crankshaft model Fig. 1 shows a crankshaft model1 whose boundary is initially partitioned into 11,788 elements and then refined three times; thus the finest model consists of 754,432 elements. The potential u in the crankshaft is given by u ¼ x1, boundary conditions are enforced accordingly. Flux q is given at the two end sections (gray parts in Fig. 1) and u at the remainder part. In the WMM we set the parameters pL ¼ 1; Z ¼ 0:4. We set the two parameters concerned with matrix compression to be Cs ¼ Ce ¼ 0:1. Tolerance for the iterative solver GMRES is 10 5. The computational times, memory usage and L2-error are listed in Table 2. All Nit are larger than 100, and the WCM is preferable in solving system (4). The times Trhs are much lower than Tmatrix, which implies that the WMM can achieve further speedup by reducing the time for computing the RHS using the FMM. Comparing column Tit with column Trhs indicates that the iteration in the WCM is much faster than in the FMM. The same conclusion can be drawn from the comparison of Tit with the result in Fig. 5 in [2]. The computational times and memory usage increase almost linearly with N. However, the 1
[20]).
¨ The model has been generated by the program NETGEN by J. Schoberl (see
L2-error decays slower than OðhÞ, which may caused be the edges on the boundary. 5.3. An engine-block model In this example, an attempt has been made to apply the WMM to steady-state heat conduction analysis of a engine block. In order to study the scaling of the computational time and memory requirements, the engine block is first discretized into 85,728 elements, then refined two times; the largest model has 1,286,268 elements. The engine block as well as part of its coarsest boundary element partition are shown in Fig. 2. The conductivity of the model is 10 W=m3 C. The temperature of the inner surface of the oblique tube and the temperature of the bottom surface are assumed to be 75 3 C and 100 3 C, respectively; see Fig. 2. Convective condition is given on other part of the boundary, where the difference between the surface temperature u and the ambient temperature u0 is assumed to be linearly related to the heat flux via a film coefficient hf, i.e., q ¼ hf ðuu0 Þ:
ð18Þ 23
In this problem, the film coefficient is hf ¼ 50 W=m C, the ambient temperature is u0 ¼ 22 3 C. One thing to note is that the WMM for linear system (3) cannot be applied to solve this problem in a straightforward manner due to the convective boundary condition. This can be fixed, however, by a slight modification to (3). Suppose q2 is given by the convective condition (18), then (3) can be recasted as " # " # q1 u1 ½H1 ðH2 hf G2 Þ ¼ ½G1 G2 : ð19Þ hf u0 u2 It is obvious that, in order to solve (19) using the WMM, Taylor expansion of function @Gðx; yÞ hf Gðx; yÞ @ny is needed. In the WMM we set the parameters pL ¼ 1; Z ¼ 0:4; Cs ¼ 1; Ce ¼ 0:1. Tolerance for the iterative solver GMRES is 10 4. Fig. 3(a) shows the computed temperature distribution on the surface of the engine block using the model with 85,728 elements. It is clear that the temperature distribution agrees well with that calculated by the finite element method (FEM) using ANSYS in Fig. 3(b). Table 3 lists the computational times and memory requirements of the models with different number of elements. Again, we see that the time for computing the RHS is much lower than that for the system matrix. In addition, the times and
ARTICLE IN PRESS 678
J. Xiao, J. Tausch / Engineering Analysis with Boundary Elements 34 (2010) 673–679
Fig. 2. An engine block with part of its boundary element partition.
Fig. 3. Temperature of the engine-block model computed by WMM and FEM. (a) WMM result. (b) FEM result.
Table 3 Computing times (s) and memory (MB) in example 5.3. N
Trhs
Tmatrix
Tit
Ttotal
Nit
Memory
85,728 325,774 1,286,268
32 77 438
121 355 2161
0.22 0.74 3.36
181 558 3234
114 157 170
400 1482 5772
memory usages scale almost linearly with the number of elements N.
6. Conclusion In order to achieve further speedup in solving direct BIEs, a WMM has been developed. The right-hand-side vector is computed using the FMM, and the linear system is solved using the WCM. The numerical results, performed on nontrivial
geometries, show that the WMM takes the advantages of both the FMM and the WCM, and can reduce the computational time in solving the direct BIEs of large-scale realistic problems.
Acknowledgments The work of Jinyou Xiao is in part supported by the Doctorate Foundation of Northwestern Polytechnical University under Grant no. CX200601 and the National Science Foundation of China under Grant no. 10674109. The work of Johannes Tausch is in part supported by the National Science Foundation under Grant DMS091522. The authors are also grateful to the anonymous referees for their helpful comments and suggestions that improved the quality and readability of the paper. References [1] Greengard L, Rokhlin V. A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Num 1997;6:229–69.
ARTICLE IN PRESS J. Xiao, J. Tausch / Engineering Analysis with Boundary Elements 34 (2010) 673–679
[2] Shen L, Liu YJ. An adaptive fast multipole boundary element method for three-dimensional potential problems. Comput Mech 2007;39:681–91. [3] Hackbusch W, Nowak ZP. On the fast matrix multiplication in the boundary element method by panel clustering. Numer Math 1989;54:463–91. ¨ [4] Borm S. Construction of data-sparse H2 matrices by hierarchical compression. SIAM J Sci Comput 2009;31(3):1820–39. [5] Phillips JR, White J. A precorrected-FFT method for electrostatic analysis of complicated 3-D structures. IEEE Trans Comput Aided Des Integr Circuits Syst 1997;16(10):1059–72. [6] Nintcheu Fata S. Fast Galerkin BEM for 3D-potential theory. Comput Mech 2008;42(3):417–29. [7] Beylkin G, Coifman R, Rokhlin V. Fast wavelet transforms and numerical algorithms. Comm Pure Appl Math 1991;37:141–83. [8] Lage C, Schwab C. Wavelet Galerkin algorithms for boundary integral equations. SIAM J Sci Comput 1999;20:2195–222. [9] Dahmen W, Harbrecht H, Schneider R. Compression techniques for boundary integral equations—optimal complexity estimates. SIAM J Numer Anal 2006;43(6):2251–71. [10] Harbrecht H, Schneider R. Wavelet Galerkin schemes for boundary integral equations—implementation and quadrature. SIAM J Sci Comput 2006;27(4): 1347–70.
679
[11] Tausch J. A variable order wavelet method for the sparse representation of layer potentials in the non-standard form. J Numer Math 2004;12(3):233–54. [12] Tausch J, White J. Multiscale bases for the sparse representation of boundary integral operators on complex geometry. SIAM J Sci Comput 2003;24(5): 1610–29. [13] Bebendorf M, Rjasanow S. Adaptive low 2003;70:1–24. [14] Tausch J. The variable order fast multipole method for boundary integral equations of the second kind. Computing 2004;72:267–91. [15] Xiao J, Zhang D, Wen L. Fully discrete Alpert multi-wavelet Galerkin BEM in 2D. Eng Anal Bound Elem 2008;32:91–9. [16] Xiao J, Tausch J, Wen L. Approximate moment matrix decomposition in wavelet Galerkin BEM. Comput Meth Appl Mech Eng 2008;197:4000–6. [17] Xiao J, Wen L, Tausch J. On fast matrix–vector multiplication in wavelet Galerkin BEM. Eng Anal Bound Elem 2009;33:159–67. [18] Xiao J, Tausch J, Wen L, Cao Y. Combined equivalent charge formulations and fast wavelet Galerkin BEM for 3-D electrostatic analysis. Int J Numer Meth Eng 2009;79(6):753–72. [19] Xiao J, Tausch J, Hu Y. A-posteriori compression of wavelet-BEM matrices. Comput Mech 2009;44(5):705–15. ¨ [20] Schoberl J. NETGEN—an advancing front 2D/3D-mesh generator based on abstract rules. Comput Visual Sci 1997;1:41–52.