Journal of Computational Physics 253 (2013) 50–63
Contents lists available at SciVerse ScienceDirect
Journal of Computational Physics www.elsevier.com/locate/jcp
A fast finite difference method for three-dimensional time-dependent space-fractional diffusion equations and its efficient implementation Hong Wang a,∗ , Ning Du b a b
Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA School of Mathematics, Shandong University, Jinan, Shandong 250100, China
a r t i c l e
i n f o
Article history: Received 29 November 2012 Received in revised form 7 June 2013 Accepted 30 June 2013 Available online 10 July 2013 Keywords: Anomalous diffusion Circulant matrix Conjugate gradient squared method Fast Fourier transform Space-fractional diffusion equation Toeplitz matrix
a b s t r a c t Fractional diffusion equations model phenomena exhibiting anomalous diffusion that cannot be modeled accurately by second-order diffusion equations. Because of the nonlocal property of fractional differential operators, numerical methods for space-fractional diffusion equations generate dense or even full coefficient matrices with complicated structures. Traditionally, these methods were solved with Gaussian elimination, which requires computational work of O ( N 3 ) per time step and O ( N 2 ) of memory to store where N is the number of spatial grid points in the discretization. The significant computational work and memory requirement of these methods makes a numerical simulation of threedimensional space-fractional diffusion equations computationally prohibitively expensive. In this paper we develop an efficient and faithful solution method for the implicit finite difference discretization of time-dependent space-fractional diffusion equations in three space dimensions, by carefully analyzing the structure of the coefficient matrix of the finite difference method and delicately decomposing the coefficient matrix into a combination of sparse and structured dense matrices. The fast method has a computational work count of O ( N log N ) per iteration and a memory requirement of O ( N ), while retaining the same accuracy as the underlying finite difference method solved with Gaussian elimination. Numerical experiments of a three-dimensional space-fractional diffusion equation show the utility of the fast method. © 2013 Elsevier Inc. All rights reserved.
1. Introduction Diffusion processes occur in nature, science, engineering, and other applications. The underlying assumption in the derivation of the classical second-order diffusion equation is the existence of a mean free path and a mean free time taken for a particle to perform a jump. That is, a particle’s motion has virtually no spatial correlation and long walks in the same direction are rare, so the variance of a particle excursion distance is finite. The central limit theorem concludes that the probability of finding a particle somewhere in space satisfies a Gaussian distribution, which gives a probabilistic description of a diffusion process. In last few decades more and more diffusion processes were found to be non-Fickian. In these processes the overall motion of a particle is better described by steps that are not independent and that can take vastly different times to perform. A particle’s long walk may have long correlation length so the processes can have large deviations from the stochastic process of Brownian motion.
*
Corresponding author. Tel.: +1 803 777 4321; fax: +1 803 777 6527. E-mail address:
[email protected] (H. Wang).
0021-9991/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jcp.2013.06.040
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
51
Recent studies show that fractional diffusion equations provide an adequate and accurate description of transport processes that exhibit anomalous diffusion, which cannot be modeled properly by second-order diffusion equations [3,30]. Subsequently, fractional diffusion equations have been applied in increasingly more applications. These applications range from the design of photocopiers and laser printers [30], the signaling of biological cells [27], anomalous electrodiffusion in nerve cells [10], foraging behavior of animals [26], electrochemistry [23], to physics [21], finance [25], fluid and continuum mechanics [17], viscoelastic and viscoplastic flow [24], and solute transport in groundwater [3]. Since the last decade or so, extensive research has been conducted on the development of numerical methods for fractional diffusion equations, including finite difference methods, finite element methods, and finite volume methods [4,6, 14,18,22,15,28,31,37]. However, because of the non-local nature of fractional differential operators, numerical methods for fractional diffusion equations raise additional numerical difficulties that were not encountered in the numerical methods for second-order diffusion equations. In the context of space-fractional diffusion equations, the corresponding numerical methods often generate dense or even full coefficient matrices with complicated structures [6,18,28,36–38,40]. Traditionally, these methods were solved via Gaussian elimination, which requires O ( N 3 ) of operations per time step and O ( N 2 ) of memory to store where N is the number of spatial grid points in the discretization. This results in severe computational issues. For instance, each time we refine the spatial mesh size h by half, the number of spatial grids is octupled and the computational work increases 512 times while the memory requirement increases 64 times. If the time step size t is refined by half too, then the overall computational work of the numerical method increases 1024 times. Hence, the numerical simulation of space-fractional diffusion equations in three space dimensions is computationally prohibitively intensive. This is probably the reason why no numerical methods have been developed and implemented for space-fractional diffusion equations in three space dimensions in the literature, even though theoretical error estimates have been proved for some of the numerical methods [6,18]. Therefore, development of fast and faithful numerical methods with efficient storage for three-dimensional space-fractional diffusion equations is crucial for the applications of fractional diffusion equations. In this paper we develop an efficient and faithful solution method for the implicit finite difference discretization of time-dependent space-fractional diffusion equations in three space dimensions, by carefully analyzing the structure of the coefficient matrix of the finite difference method and delicately decomposing the coefficient matrix into a combination of sparse and structured dense matrices. The fast method has a computational work count of O ( N log N ) per iteration and a memory requirement of O ( N ) per time step, while retaining the same accuracy as the regular finite difference method. The rest of the paper is organized as follows. In Section 2 we outline the space-fractional diffusion equation we attempt to solve and present the corresponding finite difference method. In Section 3 we recall the conjugate gradient squared method and identify the bottleneck issues in the application of the method to time-dependent space-fractional diffusion equations. In Section 4 we develop an efficient O ( N ) storage scheme for the dense stiffness matrix of the fractional diffusion equation. In Section 5 we develop a fast O ( N log N ) algorithm for evaluating the product of the stiffness matrix with any vector. In Section 6 we derive an efficient implementation algorithm for the fast method developed in Section 5. In Section 7 we carry out numerical experiments to study the performance of the fast conjugate gradient squared method with the efficient implementation developed in Sections 5–6. 2. Time-dependent space-fractional diffusion equations and its finite difference approximation We consider the initial–boundary value problem of a class of time-dependent space-fractional diffusion equations in three space dimensions
∂ α u (x, y , z, t ) ∂ α u (x, y , z, t ) ∂ u (x, y , z, t ) − a+ (x, y , z, t ) − a− (x, y , z, t ) α ∂t ∂+ x ∂− xα − b+ (x, y , z, t )
∂ β u (x, y , z, t ) ∂ β u (x, y , z, t ) − b ( x , y , z , t ) − ∂+ y β ∂− y β
∂ γ u (x, y , z, t ) ∂ γ u (x, y , z, t ) − c − (x, y , z, t ) = f (x, y , z, t ), γ ∂+ z ∂− z γ t ∈ (0, T ],
− c + (x, y , z, t ) (x, y , z) ∈ Ω,
u (x, y , z, t ) = 0,
(x, y , z) ∈ ∂Ω, t ∈ [0, T ],
u (x, y , z, 0) = u o (x, y , z),
(x, y , z) ∈ Ω.
(1)
Here Ω := (x L , x R ) × ( y L , y R ) × ( z L , z R ) is a cuboid domain. The left-sided and the right-sided fractional derivatives ∂ α u (x, y , z,t ) ∂ α u (x, y , z,t ) ∂ β u (x, y , z,t ) ∂ β u (x, y , z,t ) ∂ γ u (x, y , z,t ) ∂ γ u (x, y , z,t ) , , , , and are defined as [24,29] ∂+ xα ∂− xα ∂+ zγ ∂− zγ ∂+ y β ∂− y β
∂ α u (x, y , z, t ) 1 := lim α + ∂+ xα h h→0
(x− x L )/h l =0
(α )
gl
u (x − lh, y , z, t ),
52
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
∂ α u (x, y , z, t ) 1 := lim α α + ∂− x h→0 h ∂ β u (x, y , z, t ) 1 := lim β ∂+ y β h→0+ h ∂ β u (x, y , z, t ) 1 := lim β ∂− y β h→0+ h ∂ γ u (x, y , z, t ) 1 := lim γ ∂+ z γ h→0+ h ∂ γ u (x, y , z, t ) 1 := lim γ ∂− z γ h→0+ h
(x R −x)/h
(α )
u (x + lh, y , z, t ),
(β)
u (x, y − lh, z, t ),
gl
l =0 y L )/h ( y −
gl
l =0
− y )/h ( y R
(β)
gl
u (x, y + lh, z, t ),
l =0
( z− z L )/h
(γ )
gl
u (x, y , z − lh, t ),
l =0
( z R − z)/h
(γ )
gl
u (x, y , z + lh, t ),
l =0
α
(α )
(2)
α
where x represents the floor of x and gl := (−1)l l with l being the fractional binomial coefficients. Let h1 := (x R − x L )/( N 1 + 1), h2 := ( y R − y L )/( N 2 + 1), h3 := ( z R − z L )/( N 3 + 1), and t := T / M be the sizes of spatial grids and time step, with N 1 , N 2 , N 3 , and M being positive integers. We define a spatial and temporal partition xi := x L + ih1 for i = 0, 1, . . . , N 1 + 1, y j := y L + jh2 for j = 0, 1, . . . , N 2 + 1, zk := z L + kh2 for k = 0, 1, . . . , N 3 + 1, and t m := mt for m = 0, 1, . . . , M. While the first-order time derivative in (1) can be discretized by a standard first-order time difference quotient, the discretization of the fractional spatial derivative requires special care. Meerschaert and Tadjeran [19,20] showed that a fully implicit finite difference scheme with a direct truncation of the series in (2) turns out to be unstable! Instead, they used a shifted Grünwald approximation to develop a finite difference scheme for the one-dimensional analogue of problem (1). They proved the unconditional stability and first-order convergence in space and time for the finite difference scheme [19,20]. Subsequently, Meerschaert, Scheffler, and Tadjeran developed finite difference method for two-dimensional time-dependent space-fractional diffusion equation and proved its unconditional stability and first-order spatial and temporal convergence [18]. m m m m := u (xi , y j , zk , t m ), a+, := a+ (xi , y j , zk , t m ), a−, := a− (xi , y j , zk , t m ), b+, := b+ (xi , y j , zk , t m ), b−, := Let um i , j ,k i , j ,k i , j ,k i , j ,k i , j ,k +,m
−,m
m b− (xi , y j , zk , t m ), c i , j ,k := c + (xi , y j , zk , t m ), c i , j ,k := c − (xi , y j , zk , t m ), and f im , j ,k := f (xi , y j , tk , t ). The three-dimensional finite difference method for problem (1) can be formulated as follows:
−1 um − um i , j ,k i , j ,k
t
− −
−
+,m i +1 −,m ai , j ,k ai , j ,k (α ) m g u − i −l+1, j ,k l hα hα 1 1
l =0 +,m j +1 b i , j ,k (β) gl u m i , j −l+1,k β h 2 l =0 +,m k+1 c i , j ,k (γ ) gl u m γ i , j ,k−l+1 h 3 l =0
1 i N1,
− −
N 1 − i +2
(α ) m u i +l−1, j ,k
gl
l =0 −,m N 2 − j +2 b i , j ,k (β) gl u m i , j +l−1,k β h2 l =0 −,m −k+2 c i , j ,k N 3 (γ ) gl u m γ i , j ,k+l−1 h3 l =0
1 j N2,
1 k N3,
= f im, j ,k ,
m = 1, 2, . . . , M .
(3)
The finite difference method (3) can be proved to be unconditionally stable and to be first-order convergent in space and time, in a similar manner to the analysis of its two-dimensional analogue [18]. The fundamental issue facing the finite difference method (3) is its computational cost and memory requirement. To develop a fast and efficient solution technique, we first rewrite the finite difference method (3) into a matrix form. Let I be the N-by-N identity matrix and um be an N-dimensional vectors given by
m m m m m um = um 1,1,1 , . . . , u N 1 ,1,1 , u 1,2,1 , . . . , u N 1 ,2,1 , . . . , u 1, N 2 ,1 , . . . , u N 1 , N 2 ,1 , . . . , m m m m m um 1,1, N 3 , . . . , u N 1 ,1, N 3 , u 1,2, N 3 , . . . , u N 1 ,2, N 3 , . . . , u 1, N 2 , N 3 , . . . , u N 1 , N 2 , N 3
T
,
(4)
with N = N 1 N 2 N 3 . The N-dimensional vectors um−1 and f m can be expressed in a similar fashion. The finite difference method (3) can be expressed in the matrix form
I + t A m um = um−1 + t f m .
The N-by-N stiffness matrix A
Am = Am k, p
N3 ×N3
,
m
(5)
can be expressed as an N 3 -by-N 3 block matrix
k , p = 1, 2, . . . , N 3 ,
(6)
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
53
and each block A m is an N 2 -by-N 2 block matrix k, p
m Am k, p = A k, p ; j ,l
N2 ×N2
j , l = 1, 2, . . . , N 2 ,
,
(7)
with each N 1 -by-N 1 block A m given by k, p ; j ,l
m +,m m (β) −,m (γ ) − s+, + s−, g 1 − di , j ,k + di , j ,k g 1 , i , j ,k i , j ,k m +,m (α ) −,m (α ) A k,k; j , j i ,i −1 = − r i , j ,k g 2 + r i , j ,k g 0 , m +,m (α ) −,m (α ) A k,k; j , j i ,i +1 = − r i , j ,k g 0 + r i , j ,k g 2 , ⎧ ⎨ −r +,m g (α ) , q < i − 1, m i , j ,k i −q+1 A k,k; j , j i ,q = ⎩ −r −,m g (α ) , q > i + 1, i , j ,k q−i +1 +,m (β) m −,m (β) A k,k; j , j −1 i ,q = −δi ,q si , j ,k g 2 + si , j ,k g 0 , m +,m (β) −,m (β) A k,k; j , j +1 i ,q = −δi ,q si , j ,k g 0 + si , j ,k g 2 , ⎧ ⎨ −δi ,q s+,m g (β) , l < j − 1, m i , j ,k j −l+1 A k,k; j ,l i ,q = ⎩ −δ s−,m g (β) , l > j + 1, i ,q i , j ,k l− j +1 +,m (γ ) m −,m (γ ) A k,k−1; j ,l i ,q = −δ j ,l δi ,q di , j ,k g 2 + di , j ,k g 0 , m +,m (γ ) −,m (γ ) A k,k+1; j ,l i ,q = −δ j ,l δi ,q di , j ,k g 0 + di , j ,k g 2 , ⎧ ⎨ −δ j ,l δi ,q d+,m g (γ ) , p < k − 1, m i , j ,k k− p +1 A k, p ; j ,l i ,q = ⎩ −δ δ d−,m g (γ ) , p > k + 1,
+,m
−,m (α )
Am k,k; j , j i ,i = − r i , j ,k + r i , j ,k g 1
(8)
j ,l i ,q i , j ,k p −k+1
+,m
−,m
+,m
−,m
+,m
−,m
where δi ,q = 1 if i = q or 0 otherwise, and r i , j ,k , r i , j ,k , si , j ,k , si , j ,k , di , j ,k and di , j ,k are defined by +,m r i , j ,k −,m
+,m
=
r i , j ,k =
ai , j ,k hα 1
,
−,m
ai , j ,k
, hα 1
+,m si , j ,k −,m
+,m
=
si , j ,k =
b i , j ,k β
,
,
di , j ,k =
h2
−,m
b i , j ,k β
h2
+,m
+,m di , j ,k −,m
=
c i , j ,k
γ ,
h3
−,m
c i , j ,k
γ .
h3
(9)
3. Computational cost and memory requirement of the finite difference method and its fast solution with efficient storage The stiffness matrix of the finite difference method (3) has a complex structure and is dense, as expressed in Eqs. (6)–(8). In fact, the stiffness matrix of numerical methods for space-fractional diffusion equations can even be full [6,28]. This is the reason why so far most of the numerical methods for space-fractional diffusion equations have been solved by a Gaussian elimination, which requires computational work of O ( N 3 ) operations per time step and memory storage of O ( N 2 ). For instance, each time we reduce the size of the spatial mesh by half, the total number of unknowns per time step is octupled. As a result, the required memory increases 64 times and the computational cost increases 512 times. If the time step size is reduced by half too, then the overall computational cost for solving the finite difference method increases 1024 times. The significantly increased computational work and memory requirement of the numerical methods for space-fractional diffusion equations urges the development of fast and faithful numerical methods with efficient storage mechanism for these problems. We previously derived a decomposition of the stiffness matrix into the sum of diagonal-by-Toeplitz matrices structure for the one-dimensional analogue of the finite difference method (3). Based on the decomposition we developed a fast operator-splitting method and a fast iterative solution method for the one-dimensional analogue of problem (1) [37,38]. The methods have a computational cost of O ( N log N ) per iteration or O ( N log N ) per time step and a memory requirement of O ( N ). In [36] we developed a fast alternating-direction implicit finite difference method for the two-dimensional analogue of problem (1), in which the full finite difference scheme (an analogue of scheme (3)) was expressed as two family of one-dimensional finite difference methods at each time step, then the fast methods developed in [37,38] were used to solve these one-dimensional systems. The method again requires computational work of O ( N 2 log N ) per time step and O ( N log N ) amount of memory. However, alternating-direction implicit finite difference method typically requires the one-dimensional
54
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
finite difference operators in the x direction and in the y direction commute [18]. This condition in turn requires that the diffusion coefficients in the governing equation are of particular forms. The goal of this paper is to develop a fast solution method for the full three-dimensional finite difference method (3). We begin by revising the traditional conjugate gradient squared method which can be expressed as follows [1]: At each time step t m , we choose u (0) = um−1 Compute r (0) := (um−1 + t f m ) − (u (0) + t A m u (0) ) Choose r˜ (for example, r˜ := r (0) ) for i = 1, 2, . . . ρi−1 := r˜ T r (i−1) if ρi −1 = 0 choose r˜ := r (i −1) if i = 1 w (1) := r (0) p (1) := w (1) else
νi−1 := ρi−1 /ρi−2 w (i ) := r (i −1) + νi −1 q(i −1) p (i ) := w (i ) + νi −1 (q(i −1) + νi −1 p (i −1) ) end if vˆ := p (i ) + t A m p (i ) μi := ρi−1 /˜r T vˆ q(i ) := w (i ) − μi vˆ u (i ) := u (i −1) + μi ( w (i ) + q(i ) ) qˆ := ( w (i ) + q(i ) ) + t A m ( w (i ) + q(i ) ) r (i ) := r (i −1) − μi qˆ δ := (um−1 + t f m ) − (u (i ) + t Am u (i ) ) Check for convergence; continue if necessary end um := u (i ) In the conjugate gradient squared formulation, at each time step t m the evaluation of the stiffness matrix A m requires O ( N 2 ) of operations. The matrix–vector multiplication A m p (i ) , A m ( w (i ) + q(i ) ), and A m u (i ) all requires O ( N 2 ) of computational work. All other computations in the formulation require only O ( N ) of computational work. Furthermore, the storage of the stiffness matrix A m requires O ( N 2 ) of memory, while all other operations require only O ( N ) of memory. In order to develop a fast conjugate gradient squared method for the efficient solution and storage of the system (5), we will address the following issues in the subsequent sections: (i) an efficient storage of the stiffness matrix A m and (ii) a fast matrix–vector multiplication A m w for any vector w. 4. An efficient storage of the coefficient matrix A m To develop a fast solution technique for the finite difference method (3) with minimal memory requirement, we have to carefully study the structure of the stiffness matrix A m in the finite difference method (3). Theorem 4.1. The total memory requirement for storing the stiffness matrix Am is O ( N ). Proof. From the expressions (6)–(8), we conclude that the stiffness matrix A m can be decomposed as the sum of three block matrices
A m = A m,x + A m, y + A m, z .
(10) m,x
The block matrix A m,x is a block-diagonal matrix of order N 3 where each diagonal block A k matrix of order N 2
m,x N 3 , k =1
A m,x = diag A k
m,x
Ak
itself is a block-diagonal
N = diag Amj ,,kx j =2 1 .
(11)
m,x
Here each diagonal block A j ,k of order N 1 can be decomposed as m,x
+,m α , N 1
A j ,k = diag r j ,k α,N1
Here A L
AL
m α,N1 + diag r −, AR , j ,k
j = 1, 2, . . . , N 2 , k = 1, 2, . . . , N 3 . α,N1
is a Toeplitz matrix of order N 1 defined below and A R
= ( A αL , N 1 )T is the transpose of A αL , N 1
(12)
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
⎡
α,N1
AL
(α )
g1
⎢ ⎢ (α ) ⎢ g2 ⎢ ⎢ . ⎢ .. ⎢ = −⎢ ⎢ .. ⎢ . ⎢ ⎢ ⎢ g (α ) ⎣ N 1 −1 +,m
−,m
(α )
g0
g2
(α )
g1
..
.
..
..
.
..
..
.
..
.
..
.
.
..
.
..
.
.
..
.
g1
(α )
0
⎤
⎥ ⎥ ⎥ .. ⎥ . ⎥ ⎥ ⎥. ⎥ 0 ⎥ ⎥ ⎥ (α ) ⎥ g0 ⎦ 0 ⎥
(α )
. . . g 2(α )
...
+,m
0
... .. .
(α )
(α ) g N −1 1
g N1 −,m
0
g1
(α )
+,m
(α )
g0
55
(13)
(α )
g1
−,m
r j ,k , r j ,k , s j ,k , s j ,k , d j ,k , and w j ,k for 1 j N 2 , 1 k N 3 are defined by +,m
+,m
+,m
+,m
+,m
+,m
+,m
+,m
T
+,m
T
r j ,k = r1, j ,k , r2, j ,k , . . . , r N , j ,k 1 +,m
s j ,k = s1, j ,k , s2, j ,k , . . . , s N , j ,k 1 +,m
+,m
d j ,k = d1, j ,k , d2, j ,k , . . . , d N , j ,k 1 +,m
−,m
+,m
−,m
, T
+,m
−,m
−,m
−,m
−,m
r j ,k = r1, j ,k , r2, j ,k , . . . , r N , j ,k 1
,
T
, T −,m −,m −,m −,m s j ,k = s1, j ,k , s2, j ,k , . . . , s N , j ,k , 1 −,m −,m −,m −,m T d j ,k = d1, j ,k , d2, j ,k , . . . , d N , j ,k , 1
,
(14)
−,m
where r i , j ,k , r i , j ,k , si , j ,k , si , j ,k , di , j ,k and di , j ,k are defined in (9). m, y The block matrix A m, y in the decomposition (10) is a block-diagonal matrix of order N 3 where each diagonal block A k itself is a block matrix of order N 2
m, y N 3 , k =1
m, y N = A j ,l;k j ,2l=1 .
m, y
A m, y = diag A k
Ak
(15)
m, y
Each block A j ,l;k of order N 1 can be decomposed as m, y
(β)
+,m
−,m
+,m
A j , j ;k = − g 1 diag s j ,k + s j ,k m, y
(β)
m, y
(β)
+,m
(β)
+,m
(β)
−,m
,
(β)
−,m
A j +1, j ;k = − g 2 diag s j +1,k − g 0 diag s j +1,k , A j , j +1;k = − g 0 diag s j ,k m, y
A l, j ;k = − gl− j +1 diag sl,k m, y
m (β) − g 2 diag s−, , j ,k l > j + 1,
,
A j ,l;k = − gl− j +1 diag s j ,k
l > j + 1.
,
(16) m, z
The block matrix A m,z in the decomposition (10) is a block matrix of order N 3 where each diagonal block A k, p itself is a block-diagonal matrix of order N 2
m, z N 3 , k , p =1
m, z
A m, z = A k , p
m, z
N2
A k, p = diag A j ;k, p j =1 .
(17)
m, z
Each block A j ;k, p of order N 1 can be decomposed as m, z
(γ )
+,m
−,m
+,m
+,m
A j ;k,k = − g 1 diag d j ,k + d j ,k (γ )
m, z
, (γ )
−,m
A j ;k+1,k = − g 2 diag d j ,k+1 − g 0 diag d j ,k+1 ,
m (γ ) − g 2 diag d−, , j ,k +,m (γ ) m, z A j ; p ,k = − g p −k+1 diag d j , p , p > k + 1, −,m (γ ) m, z A j ;k, p = − g p −k+1 diag d j ,k , p > k + 1. (γ )
m, z
A j ;k,k+1 = − g 0 diag d j ,k
(18) +,m
−,m
+,m
−,m
+,m
−,m
Thus, in order to store the stiffness matrix A m we need only to store r j ,k , r j ,k , s j ,k , s j ,k , d j ,k , and d j ,k , (γ ) (β) (β) (β) (β) (α ) (α ) (α ) (α ) for 1 j N 2 , 1 k N 3 , and the three vectors g N 1 = [ g 0 , g 1 , . . . , g N 1 ] T , g N 2 = [ g 0 , g 1 , . . . , g N 2 ] T , g N 3 = (γ )
(γ )
(γ )
[ g 0 , g 1 , . . . , g N 3 ] T . Thus, the total memory requirement for storing Am is 6N + N 1 + N 2 + N 3 + 3 = O ( N ). 2 Remark 4.1. The storage scheme actually stores the stiffness matrix A m exactly and is not a lossy compression.
56
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
5. A fast O ( N log N ) matrix–vector multiplication algorithm for evaluating A m u In this section we develop a fast matrix–vector multiplication algorithm for evaluating A m u for any vector u ∈ R N , which can be carried out in O ( N log N ) operations. Based on the splitting (10) of the stiffness matrix A m , we need to derive fast matrix–vector multiplication algorithms for evaluating A m,x u, A m, y u, and A m,z u for any vector u ∈ R N . 5.1. A fast O ( N log N ) matrix–vector multiplication algorithm for A m,x u In this subsection we present a fast matrix–vector multiplication algorithm for evaluating A m,x u and prove that the multiplication can be performed in O ( N log N ) of operations. Theorem 5.1. The matrix–vector multiplication A m,x u can be performed in O ( N log N ) operations. Proof. Let u be any N-dimensional vector expressed in the component form (4). We express the vector u in a block form as follows
T
u = u 1T , u 2T , . . . , ukT , . . . , u TN 3 uk =
,
T u 1T,k , u 2T,k , . . . , u Tj,k , . . . , u TN 2 ,k ,
u j ,k = [u 1, j ,k , u 2, j ,k , . . . , u N 1 , j ,k ] T .
(19)
Then the matrix–vector multiplication A m,x u is of the form
T m,x T T T ,x , A 2 u 2 , . . . , Am , N3 u N3 T T T T m,x m,x m,x m,x A k uk = A 1,k u 1,k , A 2,k u 2,k , . . . , A N ,k u N 2 ,k , 2 A m,x u =
m,x
m,x
A1 u1
(20)
m,x
where A k
and A j ,k are defined in (11) and (12). m,x To compute the matrix–vector multiplication A j ,k u j ,k in O ( N 1 log N 1 ) operations, by (12) we need only to compute
α,N1
AL
α,N1
u j ,k and A R
α,N1
u j ,k in O ( N 1 log N 1 ) operations for any u j ,k ∈ R N 1 . Note that the Toeplitz matrices A L α ,2N 1
be embedded into 2N 1 × 2N 1 circulant matrices C L α ,2N 1
CL
α,N1
where B L
=
α,N 1 A L
α,N1
BL
α,N1
α,N1
AL
α ,2N 1
CR
, α,N1
is defined below and B R
⎡
BL
α,N1
BL
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ = −⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
(α )
0
g N1
...
0
0
g N1
(α )
0
0
0
.. .
..
..
0
...
0
0
...
(α )
g0
.
.
=
α ,2N 1
and C R
α,N 1 A R
α,N1
BR
α,N1
BR
α,N1
AR
α,N1
and A R
can
[5,9]
,
(21)
= ( B αL , N 1 )T (α )
. . . g3 .. . ... .. .. . . .. .. . . .. . 0 0
0
(α )
g2
⎤
⎥ ⎥ ⎥ ⎥ .. ⎥ . ⎥ ⎥. .. ⎥ . ⎥ ⎥ ⎥ (α ) ⎥ g N1 ⎦ (α )
g3
(22)
0
It is known that a circulant matrix C α ,2N 1 can be decomposed in terms of discrete Fourier transform [5,9]
−1 C α ,2N 1 = F 2N diag F 2N 1 c α ,2N 1 F 2N 1 1
(23)
where c α ,2N 1 is the first column vector of C α ,2N 1 and F 2N 1 is the 2N 1 × 2N 1 discrete Fourier transform matrix. It is well known that the matrix–vector multiplication F 2N 1 w for w ∈ R 2N 1 can be carried out in O (2N 1 log(2N 1 )) = O ( N 1 log N 1 ) operations via fast Fourier transform (FFT). (23) shows that C α ,2N 1 w can be evaluated in O ( N 1 log N 1 ) operations for w = [u Tj,k , 0T ] T . Then A αL , N 1 u j ,k and A αR , N 1 u j ,k can be evaluated in O ( N 1 log N 1 ) operations for any u j ,k ∈ R N 1 . Consequently m,x
(12) shows that A j ,k u j ,k can be evaluated in O ( N 1 log N 1 ) operations. Finally (20) shows that A m,x u can be performed in O ( N 3 N 2 N 1 log N 1 ) = O ( N log N ) operations! 2
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
57
Remark 5.1. We observe from the proof of Theorem 5.1 that the fast matrix–vector multiplication algorithm for evaluating A m,x u is reduced to fast evaluation of a sequence of N 2 N 3 matrix–vector multiplications of m,x
1 j N2, 1 k N3.
A j ,k u j ,k ,
(24)
m,x A j ,k u j ,k
Each matrix–vector multiplication corresponds to the matrix–vector multiplication in the finite difference method with N 1 spatial nodes for the one-dimensional analogue of problem (1), and can be computed in O ( N 1 log N 1 ) of operations m,x [37,38]. Moreover, all the A j ,k u j ,k can be computed in parallel for 1 j N 2 and 1 k N 3 in a straightforward manner. m,x
m,x
Remark 5.2. In the fast evaluation of each A j ,k u j ,k in (24), the coefficient matrix A j ,k does not need to be formulated. In α,N α,N fact, A L 1 u j ,k and A R 1 u j ,k can be evaluated in O ( N 1 log N 1 ) of operations via (21), (23), and fast Fourier transform. Then m,x we apply (12) to compute A j ,k u j ,k in O ( N 1 log N 1 ) of operations. 5.2. A fast O ( N log N ) matrix–vector multiplication algorithm for A m, y u We are now in a position to develop fast matrix–vector multiplication algorithms for evaluating A m, y u for any vector u ∈ R N . We note that the matrix A m, y has more complicated structure than A m,x , so the fast O ( N log N ) matrix–vector multiplication algorithm developed for A m,x u in Section 5.1 does not apply here. We need to generalize the fast matrix–vector multiplication algorithm to the current case and prove the following theorem. Theorem 5.2. The matrix–vector multiplication A m, y u can be performed in O ( N log N ) operations. β, N
β, N
Proof. Let I N 1 be the identity matrix of order N 1 , and A L 2 and A R 2 be Toeplitz matrices of order N 2 defined in (13) with α and N 1 being replaced by β and N 2 , respectively. Then the matrix–vector multiplication A m, y u is of the form
A m, y u =
m, y
A1
u1
m, y
where the matrix A k
β, N 2
Here A L
N2 j =1
β, N 2
AL
m N 2 β, N 2 ⊗ I N 1 + diag s−, AR ⊗ I N1 . j ,k j =1 β, N 2
⊗ I N 1 is the Kronecker tensor product of A L
⎡
β, N 2
AL
⊗ I N1
(β)
g1 I N1
⎢ ⎢ (β) ⎢ g2 I N1 ⎢ ⎢ .. ⎢ . ⎢ = −⎢ .. ⎢ ⎢ . ⎢ ⎢ (β) ⎢g ⎣ N 2 −1 I N 1 (β)
g N2 I N1 m, y
We prove that A k
(25)
defined in (15) and (16) can be expressed as
m = diag s+, j ,k
m, y
Ak
T m, y T T T m, y T m, y , A 2 u 2 , . . . , A k uk , . . . , A N 3 u N 3 ,
(β)
0
(β)
g0 I N1
g2 I N1
(β)
g1 I N1
..
.
..
.
..
g0 I N1
..
0
0
..
.
0
..
.
..
.
.. .
.
..
.
..
.
.
..
(β)
(β) g N −1 I N 1 2
and I N 1 given below and A R
... .. .
(β)
g1 I N1
(26) β, N 2
.
0
(β)
(β)
g0 I N1
(β)
g1 I N1
g1 I N1
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⊗ I N1 = ( A L
⊗ I N1 )T
(27)
(β)
. . . g2 I N1
...
⎤
β, N 2
uk (k = 1, 2, . . . , N 3 ) can be performed in O (( N 1 N 2 ) log( N 1 N 2 )) operations. Note that the β, N 2
β, N
⊗ I N 1 and A R 2 ⊗ I N 1 are N 2 -by-N 2 block-Toeplitz–circulant-block matrices with N 1 -by-N 1 β, N circulant blocks. A L ⊗ I N 1 and A R 2 ⊗ I N 1 can be expanded into N 1 N 2 -by-N 1 N 2 block-circulant–circulant-block matrices β, N β, N as follows: Let B L 2 and B R 2 be Toeplitz matrices of order N 2 defined by (22) with α and N 1 being replaced by β and N 2 , β, N respectively. Then the N 1 N 2 -by-N 1 N 2 matrices B L 2 ⊗ I N 1 is a block-Toeplitz–circulant-block matrix of the following form β, N 2 β, N 2 ⊗ I N1 = ( B L ⊗ I N1 )T and B R N 1 N 2 -by-N 1 N 2 matrices A L β, N 2
⎡
β, N 2
BL
⊗ I N1
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ = −⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
(β)
0
g N2 I N1
...
0
0
g N2 I N1
0
0
0
.. .
..
.
..
0
...
0
0
...
(β)
g0 I N1
(β)
.
(β)
. . . g3 I N1 .. . ... .. .. . . .. .. . . .. . 0 0
0
(β)
g2 I N1 (β)
⎤
⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
g3 I N1 ⎥
.. . .. . (β)
g N2 I N1 0
(28)
58
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
β,2N 1 N 2
Then 2N 1 N 2 -by-2N 1 N 2 block-circulant–circulant-block matrices C L
β, N 2
AL
β, N 2
⊗ I N1 , A R
β, N 2
⊗ I N1 , B L
β,2N 1 N 2
CL
:=
β,2N 1 N 2
CR
:=
β, N 2
⊗ I N 1 , and B R
β, N 2
⊗ I N1
BL
β, N 2
⊗ I N1
AL
β, N 2
⊗ I N1
BR
⊗ I N1
AR
AL BL
AR
β, N 2
BR
β, N 2
⊗ I N1
β, N 2
⊗ I N1
β, N 2
⊗ I N1
β, N 2
β,2N 1 N 2
and C R
can be introduced based on
⊗ I N 1 as follows
,
,
⊗ I N1
w 2N 1 N 2 :=
w 0
(29)
.
Here w 2N 1 N 2 ∈ R 2N 1 N 2 is an extension of w by 0 ∈ R N 1 N 2 for any vector w ∈ R N 1 N 2 . It is clear that
β,2N 1 N 2
CL
w 2N 1 N 2 =
β,2N 1 N 2
CR
w 2N 1 N 2 =
β, N 2
⊗ I N1 ) w
β, N 2
⊗ I N1 ) w
β, N 2
⊗ I N1 ) w
β, N 2
⊗ I N1 ) w
(AL (B L
(AR (B R
β, N 2
Thus, the matrix–vector products ( A L β,2N 1 N 2
β,2N 1 N 2
, (30)
. β, N 2
⊗ I N 1 ) w and ( A R
⊗ I N 1 ) w can be obtained as the first half of the matrix–vector
w 2N 1 N 2 and C R w 2N 1 N 2 , respectively. products C L Let C β,2N 1 N 2 be a 2N 2 -by-2N 2 block-circulant–circulant-block matrix with N 1 -by-N 1 circulant blocks and c β,2N 1 N 2 be its first column vector. The corresponding two-dimensional discrete Fourier transform matrix is F 2N 2 ⊗ F N 1 . The Fourier transform cˆ β,2N 1 N 2 of c β,2N 1 N 2 is given by
cˆ β,2N 1 N 2 := ( F 2N 2 ⊗ F N 1 )c β,2N 1 N 2 .
(31)
It is known that C β,2N 1 N 2 has the following diagonalization [5,9]
C β,2N 1 N 2 = ( F 2N 2 ⊗ F N 1 )−1 diag cˆ β,2N 1 N 2 ( F 2N 2 ⊗ F N 1 ).
(32)
It is well known that for any vector w 2N 1 N 2 ∈ R 2N 1 N 2 the matrix–vector multiplication ( F 2N 2 ⊗ F N 1 ) w 2N 1 N 2 can be carried out in O (2N 1 N 2 log(2N 1 N 2 )) = O ( N 1 N 2 log N 1 N 2 ) operations via the fast Fourier transform (FFT). (32) shows that β, N C β,2N 1 N 2 w 2N 1 N 2 can be evaluated in O ( N 1 N 2 log N 1 N 2 ) operations. Then (29) and (30) show that ( A L 2 ⊗ I N 1 ) w and β, N 2 ⊗ I N 1 ) w can be performed in O ( N 1 N 2 log( N 1 N 2 )) operations m, y A k uk can be performed in O ( N 1 N 2 log( N 1 N 2 )) operations, then it is O ( N 3 N 1 N 2 log( N 1 N 2 )) = O ( N log N ) operations. 2
(AR
for any w ∈ R N 1 N 2 . Therefore, (26) shows that clear from (25) that A m, y u can be performed in
Remark 5.3. We observe from the proof of Theorem 5.2 that the fast matrix–vector multiplication algorithm for evaluating A m, y u is reduced to fast evaluation of a sequence of N 3 of matrix–vector multiplications of m, y
Ak
uk ,
1 k N3.
(33) m, y
Each matrix–vector multiplication A k uk corresponds to the matrix–vector multiplication in the finite difference method with N 1 N 2 spatial nodes for the two-dimensional analogue of problem (1), and can be computed in O ( N 1 N 2 log( N 1 N 2 )) of m, y operations [34]. Moreover, all the A k uk can be computed in parallel for 1 k N 3 in a straightforward manner. m, y
Remark 5.4. In the fast evaluation of each A k β, N 2
β, N 2
m, y
uk in (33), the coefficient matrix A k
does not need to be formulated.
⊗ I N 1 )uk in (26) can be computed in O ( N 1 N 2 log( N 1 N 2 )) of operations via (32) and twom, y dimensional fast Fourier transform. Then we apply (26) to compute A k uk in O ( N 1 N 2 log( N 1 N 2 )) of operations. (AL
⊗ I N 1 )uk and ( A R
5.3. A fast O ( N log N ) matrix–vector multiplication algorithm for A m,z u In this subsection we develop a fast matrix–vector multiplication algorithm for evaluating A m,z u, which is presented in the proof of the following theorem. Theorem 5.3. The matrix–vector multiplication A m,z u can be performed in O ( N log N ) operations.
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
γ ,N
59
γ ,N
Proof. Let I N 1 N 2 be the identity matrix of order N 1 N 2 and let A L 3 and A R 3 be Toeplitz matrices of order N 3 defined in (13) with α and N 1 being replaced by γ and N 3 , respectively. The key in the development of the fast algorithm comes from the following decomposition of the matrix A m,z defined in (17) and (18)
+,m N 2 N 3 γ , N 3 AL j =1 k =1
A m,z = diag diag d j ,k γ ,N
γ ,N
m N2 N3 γ ,N3 ⊗ I N 1 N 2 + diag diag d−, AR ⊗ I N1 N2 . j ,k j =1 k =1 β, N
(34)
β, N
Here A L 3 ⊗ I N 1 N 2 and A R 3 ⊗ I N 1 N 2 have similar expressions to A L 2 ⊗ I N 1 and A R 2 ⊗ I N 1 which were presented in (27), with β and N 2 being replaced by γ and N 3 , I N 1 replaced by I N 1 N 2 , respectively. m, y in (26) have the same structure. Thus, Despite that they have different dimensions, the matrices A m,z in (34) and A k m, y
the matrix–vector multiplication A m,z u can be performed in the same way as A k O ( N log N ) of operations. 2
uk can, in O (( N 1 N 2 ) N 3 log(( N 1 N 2 ) N 3 )) =
m, y
Remark 5.5. We note from (34) that A m,z exhibits a similar decomposition as A k does in (26). Nevertheless, the structure of A m,z is fully three-dimensional. Again, in the fast evaluation of A m,z u, the matrix A m,z does not need to be formulated. γ ,N γ ,N ( A L 3 ⊗ I N 1 N 2 )u and ( A R 3 ⊗ I N 1 N 2 )u can be computed via two-dimensional fast Fourier transform. Then we apply (34) to evaluate A m,z u in O ( N log N ) of operations. Remark 5.6. The storage scheme actually stores the stiffness matrix A m exactly and is not a lossy compression. We apply Theorems 5.1, 5.2, and 5.3 and
A m u = A m,x u + A m, y u + A m, z u
(35)
to get the following corollary. Corollary 5.4. The matrix–vector multiplication A m u can be performed in O ( N log N ) of operations. Remark 5.7. The fast O ( N log N ) matrix–vector multiplication algorithm for evaluating A m u is carried out exactly and is not a lossy approximation. 6. An efficient implementation of the fast matrix–vector multiplication algorithm for evaluating A m u In the previous section we developed a fast matrix–vector multiplication algorithm for evaluating A m u in O ( N log N ) of operations. We note that for any vector u ∈ R N the fast evaluation of A m,x u, A m, y u, and A m,z u requires different algorithms. In particular, the fast evaluation of A m,x u can be carried out in the simplest and most straightforward manner and has the best parallelizability. The reason is that A m, y and A m,z have more complicated structure than A m,x does. The roles of x, y and z are symmetric in the fractional diffusion equation (1), the matrix A m,x has the simplest structure than the matrices A m, y and A m,z do just because the nodes in the x direction are labeled first. This reminds us of the alternating-direction implicit method, in which a proper reindexing of the nodes in different coordinate directions along with an operator splitting reduces the fractional diffusion equation (1) to different families of one-dimensional problems in each coordinate direction [18,33,36]. We are motivated to apply the same idea to develop an efficient implementation mechanism for evaluating A m u. We notice that the matrices A m,x , A m, y , and A m,z account for the spatial coupling in the finite difference method (3) in the x direction, the y direction, and the z direction, respectively. Further, the physical coupling of different spatial nodes is independent of the index of each node. But the structure of the matrices A m,x , A m, y , and A m,z indeed depends on a specific index of the spatial nodes. Let u be any vector that corresponds to the labeling given in (4) (labeling the nodes in the x direction first, the nodes in the y direction second, and the nodes in the z direction last). Let the vector v denote the reindexing of the vector u by labeling the nodes in the y direction first, the nodes in the z direction second, and the nodes in the x direction last
v := [u 1,1,1 , . . . , u 1, N 2 ,1 , u 1,1,2 , . . . , u 1, N 2 ,2 , . . . , u 1,1, N 3 , . . . , u 1, N 2 , N 3 , . . . , u N 1 ,1,1 , . . . , u N 1 , N 2 ,1 , u N 1 ,1,2 , . . . , u N 1 , N 2 ,2 , . . . , u N 1 ,1, N 3 , . . . , u N 1 , N 2 , N 3 ] T .
(36)
Then we have
v = P yu
(37) m, y
where P y represents the permutation matrix that maps u to v. Let B denote the matrix that accounts for the spatial coupling in the finite difference method (3) in the y direction, which is obtained under the indexing in (36), i.e., labeling the nodes in the y direction first, then the nodes in the z direction, and finally, the nodes in the x direction. Then we have
A m, y = P Ty B m, y P y .
(38)
60
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
We combine (37) and (38) to obtain
A m, y u = P Ty B m, y v .
(39) m, y
m, y
Eq. (39) gives the relationship between A u and B v. Since on the right-hand side of the equation, the nodes in the y direction are labeled first, so the matrix B m, y has the same structure as A m,x as given in (13) and (20) with A replaced by B, N 3 replaced by N 1 , N 2 replaced by N 3 , and N 1 replaced by N 2 . Then the fast matrix–vector multiplication algorithm in Section 5.1 can be applied to compute B m, y v in O ( N log N ) of operations. In addition, the fast evaluation of B m, y v can be reduced to a sequence of N 1 N 3 of one-dimensional matrix–vector multiplication of size N 3 that is of the form (24) that can be computed in O ( N 2 log N 2 ) of operations as described in Section 5.1. By (39) after we obtain the vector B m, y v, a reindexing of the vector B m, y v by labeling the nodes in the x direction first, the nodes in the y direction second, and the nodes in the z direction last will generate the desired result A m, y u. We evaluate A m,z u in a similar manner. Let the vector w denote the reordering of the vector u by labeling the nodes in the z direction first, the nodes in the x direction second, and the nodes in the y direction last
w := [u 1,1,1 , . . . , u 1,1, N 3 , u 2,1,1 , . . . , u 2,1, N 3 , . . . , u N 1 ,1,1 , . . . , u N 1 ,1, N 3 , . . . , u 1, N 2 ,1 , . . . , u 1, N 2 , N 3 , u 2, N 2 ,1 , . . . , u 2, N 2 , N 3 , . . . , u N 1 , N 2 ,1 , . . . , u N 1 , N 2 , N 3 ] T .
(40)
Then we have
w = P zu
(41)
where P z represents the permutation matrix that maps u to w. Let C m,z denote the matrix that accounts for the spatial coupling in the finite difference method (3) in the z direction, which is obtained under the indexing in (40), i.e., labeling the nodes in the z direction first, the nodes in the x direction second, and the nodes in the y direction last. Then we have
A m,z u = P zT C m,z w .
(42) m, z
Since on the right-hand side of the equation, the nodes in the z direction are labeled first, so the matrix C has the same structure as A m,x as given in (13) and (20) with A replaced by C , u replaced by w, N 3 replaced by N 2 , N 2 replaced by N 1 , and N 1 replaced by N 3 . Then the fast matrix–vector multiplication algorithm in Section 5.1 can be applied to compute C m,z w in O ( N log N ) of operations. By (42) after we obtain the vector C m,z w, a reindexing of the vector C m,z w by labeling the nodes in the x direction first, the nodes in the y direction second, and the nodes in the z direction last will generate the desired result A m,z u. We summarize these results to formulate the following algorithm: An efficient implementation algorithm of the fast matrix–vector multiplication for evaluating A m u: 1. In the finite difference method (3) we label the nodes in the x direction first, the nodes in the y direction second, and the nodes in the z direction last. Let any vector u be labeled in this order as given in (4) and be partitioned into block forms defined in (19). We then use the fast matrix–vector multiplication algorithm in Section 5.1 to compute each m,x A j ,k u j ,k in O ( N 1 log N 1 ) of operations for 1 j N 2 and 1 k N 3 independently to obtain A m,x u. 2. Relabel the nodes in the y direction first, the nodes in the z direction second, and the nodes in the x direction last. Define the vector v to be the corresponding vector of u after the reordering. m, y m, y m,x Under this labeling the matrices B i ,k have similar forms to A j ,k . So B i ,k v i ,k can be computed using the same fast
matrix–vector algorithm in Section 5.2 in O ( N 2 log N 2 ) of operations for 1 i N 1 and 1 k N 3 . This yields B m, y v. Then a reindexing of the resulting vector B m, y v by labeling the nodes in the x direction first, the nodes in the y direction second, and the nodes in the z direction last yields A m, y u. 3. Relabel the nodes in the z direction first, the nodes in the x direction second, and the nodes in the y direction last. Define the vector w to be the corresponding vector of u after the reordering. m, z m,x m, z Under this labeling the matrices C i , j have similar forms to A j ,k . So C i , j w i , j can be computed using the same fast
matrix–vector algorithm in Section 5.1 in O ( N 3 log N 3 ) of operations for 1 i N 1 and 1 j N 2 . This yields C m,z w. Then a reindexing of the resulting vector C m, y w by labeling the nodes in the x direction first, the nodes in the y direction second, and the nodes in the z direction last yields A m, y u. 4. Apply (35) to evaluate the desired matrix–vector multiplication A m u for any vector u ∈ R N . Remark 6.1. One key ingredient in the fast implementation algorithm is the reindexing of the spatial nodes in the finite difference method (3). In the implementation, we use a three-dimensional array U (1 : N 1 , 1 : N 2 , 1 : N 3 ) of size N 1 N 2 N 3 to store the node values u i , j ,k for 1 i N 1 , 1 j N 2 , and 1 k N 3 . Thus, labeling the nodes in the x direction first, the nodes in the y direction second, and the nodes in the z direction last corresponds to running the index i from 1 to N 1 first, the index j from 1 to N 2 second, and the index k from 1 to N 3 last in the array U (1 : N 1 , 1 : N 2 , 1 : N 3 ). Similarly, labeling the nodes in the y direction first, the nodes in the z direction second, and the nodes in the x direction last corresponds to running the index j from 1 to N 2 first, the index k from 1 to N 3 second, and the index i from 1 to N 1 last in the array U (1 : N 1 , 1 : N 2 , 1 : N 3 ).
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
61
For example, the fast evaluation of A m,x u is reduced to fast evaluation of a sequence of N 2 N 3 of one-dimensional m,x problems A j ,k u j ,k for 1 j N 2 and 1 k N 3 . This corresponds to assigning u j ,k = U (1 : N 1 , j , k) as the vectors in
the x direction for 1 j N 2 . We can similarly handle the fast evaluation of B m, y v and C m,z w, which corresponds to running the index j first (and then the index k and the index i) and the index k first (and then the index i and the index j), respectively. There is no need to physically relabel and inversely relabel the nodes. Remark 6.2. The fast implementation algorithm only uses the fast matrix–vector multiplication algorithm developed Section 5.1, which can be carried out via the reindexing of the spatial nodes. Therefore, the fast algorithm method can be extended to problem (1) defined on more complex spatial domains, such as L-Shaped domains or a finite number of union of cuboidal domains, or even general convex domains provided that the boundary condition can be properly incorporated into the finite difference method (the same constraint to finite difference method for second-order partial differential equations defined in a smooth domain). 7. Numerical experiments In this section we carry out numerical experiments to study the performance of the fast conjugate gradient squared (FCGS) method with the efficient fast matrix–vector multiplication developed in Sections 5–6. We also compare its performance with the finite difference (FD) method (3) which is solved by Gaussian elimination. In the numerical experiments we compute the errors of u FD − u and u FCGS − u in the discrete L 1 , L 2 , and L ∞ norms
uh − u L 1 :=
N3 N2 N1 u M
i , j ,k
− u (xi , y j , zk , T )h1 h2 h3 ,
k =1 j =1 i =1
uh − u L 2 :=
N3 N2 N1 u M
2 i , j ,k − u (xi , y j , zk , T ) h 1 h 2 h 3
k =1 j =1 i =1
uh − u L ∞ :=
max
1k N 3 , 1 j N 2 , 1i N 1
M u
i , j ,k
1/2
− u (xi , y j , zk , T ),
, (43)
normalized by max(x, y ,z)∈Ω u (x, y , z, T ), with u h = u FD or u FCGS , respectively. In this numerical example run, a+ (x, y , z, t ) = a− (x, y , z, t ) = b+ (x, y , z, t ) = b− (x, y , z, t ) = c + (x, y , z, t ) = c − (x, y , z, t ) = D = 0.005, α = β = γ = 1.8, and f = 0. The spatial domain is Ω = (−1, 1) × (−1, 1) × (−1, 1) and the time interval is [0, T ] = [0, 1]. The true solution is the fundamental solution to (1) which is expressed via the inverse Fourier transform
u (x, y , z, t ) =
1
∞
π
e −2D | cos(
πα )|(t +0.5)ξ α 2
cos(ξ x) dξ
0
×
1
∞
π
e −2D | cos(
πβ
e −2D | cos(
πγ
2
)|(t +0.5)ηβ
cos(η y ) dη
)|(t +0.5)ζ γ
cos(ζ z) dζ.
0
×
1
∞
π
2
(44)
0
The initial condition u o (x, y , z) is chosen to be u (x, y , z, 0). In Table 1 we present the L 1 , L 2 and L ∞ errors at successively refined spatial meshes and time steps and use a linear regression to fit the convergence rates and the associated constants in the expression with h := h1 = h2 = h3 = t
u h − u L p C κ h κ ,
p = 1, 2, ∞.
(45)
We also present the consumed CPU times in Table 2. Both the numerical methods were implemented in Fortran 90 and all the numerical experiments were carried out on a work station with 96-GB memory. The finite difference method requires a computational work count of O ( N 3 ) operations per time step and a memory count of O ( N 2 ). Each time we refine the spatial mesh size by half, the number of spatial grids is octupled. As a result, the computational work increases 512 times and the memory requirement increases by 64 times. If the time step size is reduced by half too, then the overall computational work for solving the problem increases 1024 times. The numerical results in Table 2 are consistent with these predictions. The finite difference method (3) solved the three-dimensional space-fractional diffusion equation (1) with a mesh size of h = 2−3 (i.e., 4096 spatial nodes) and a time step size of t = 2−3 (i.e., 8 time steps) in 1 hour 4 minutes and 25 seconds. The method solved the same problem with h = 2−4 (i.e., 32,768 spatial nodes)
62
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
Table 1 The (normalized) L 1 , L 2 , and L ∞ errors of the FCGS method and the FD method.
u FD − u L 1
u FD − u L 2
u FD − u L ∞
FD
h = t 2−3 2−4
4096 32,768
6.9722 × 10−3 2.4481 × 10−3
8.5324 × 10−3 3.6213 × 10−3
6.1686 × 10−2 3.4276 × 10−2
u FCGS − u L 1
u FCGS − u L 2
u FCGS − u L ∞
FCGS
2−3 2−4 2−5 2−6 2−7
4096 32,768 262,144 2,097,152 16,777,216
2.8723 × 10−3 1.0401 × 10−3 3.6648 × 10−4 1.5960 × 10−4 8.5728 × 10−5
8.7762 × 10−3 3.9593 × 10−3 1.3037 × 10−3 4.9146 × 10−4 2.1301 × 10−4
6.3448 × 10−2 4.0999 × 10−2 2.4586 × 10−2 9.5627 × 10−3 4.0227 × 10−3
0.0369 1.2837
0.1597 1.3739
0.6197 1.0059
# of nodes
Cκ
κ
Table 2 The consumed CPU of the FCGS method and the FD method. h = t
# of nodes
2−3 2−4
4096 32,768
2−3 2−4 2−5 2−6 2−7
4096 32,768 262,144 2,097,152 16,777,216
CPU (seconds) of the FD 3865.76 = 1 h 4 m 25 s 122,952 m = 2 months 25 d 9 h 12 m CPU (seconds) of the FCGS 1.23 17.31 326 = 5 m 26 s 5800 = 1 h 36 m 40 s 101,102 = 1 day 4 h 5 m 2 s
and t = 2−4 (i.e., 16 time steps) in 2 months and 25 days of CPU times. The method ran out of memory for the same problem with h = 2−5 , i.e., 262,144 spatial nodes. In contrast, the fast conjugate gradient squared (FCGS) method solved problem (1) with h = 2−3 and t = 2−3 in 1.23 seconds of CPU time and that with h = 2−4 and t = 2−4 in 17.31 seconds of CPU time, respectively. Moreover, the FCGS method solved problem (1) with a mesh size of h = 2−7 (i.e., 16,777,216 spatial nodes) and a time step size of t = 2−7 (i.e., 128 time steps) in 1 day 4 hours 5 minutes and 2 seconds of CPU times on the same work station. This shows the utility of the fast method developed. 8. Concluding remarks In this paper we develop a fast solution method for the implicit finite difference discretization of time-dependent spacefractional diffusion equations in three space dimensions, by carefully analyzing the structure of the coefficient matrix of the finite difference method and delicately decomposing the coefficient matrix into a combination of sparse and structured dense matrices. The goal of this paper is not to develop a new method but rather to develop a fast solution method to accelerate existing finite difference method with already proved stability and convergence. Despite that many numerical methods have been developed for space-fractional diffusion equations, virtually all of them have significant computational cost and memory requirement which presents major computational obstacles in realistic numerical simulation in three space dimensions. The fast method developed in this paper has a computational work count of O ( N log N ) per iteration and a memory requirement of O ( N ) per time step, while retaining the same accuracy as the finite difference method. With slight modifications the idea of the fast solution method applies to other numerical methods. A fast solution method for a second-order Crank–Nicolson finite difference method was developed in [2] for space-fractional diffusion equations in one space dimension, while a fast finite volume method with an efficient preconditioner was developed in [35] for a space-fractional conservative diffusion equation in one space dimension. There have been many works in the literature on the stability and error estimates of the finite difference method, finite element method, and spectral method [6–8,18–20,11–14,16,32]. However, the theoretical results on the existence and uniqueness of the true solution to space-fractional differential equations are meager. Ervin and Roop [7,8] developed a Galerkin weak formulation for space-fractional elliptic differential equations with a constant diffusivity coefficient and proved that the formulation is coercive and bounded which guarantees the existence and uniqueness of the weak solution. In [39] one of the authors and his collaborator presented a counterexample to show that the Galerkin weak formulation lost its coercivity in the context of variable-coefficient problems. We derived a Petrov–Galerkin formulation for a class of space-fractional variable-coefficient diffusion equations and proved that the weak formulation is weakly coercive. Thus, the Petrov–Galerkin formulation has a unique weak solution in a slightly weaker fractional Sobolev space.
H. Wang, N. Du / Journal of Computational Physics 253 (2013) 50–63
63
Acknowledgements This work was supported in part by the National Science Foundation under grants EAR-0934747 and DMS-1216923, by the National Natural Science Foundation of China under grants 10801092 and 91130010, and the National Basic Research Program of China 973 Program, under grant 2007CB814906. The authors would like to express their sincere thanks to the referees for their very helpful comments and suggestions, which greatly improved the quality of this paper. References [1] R. Barrett, M. Berry, T.F. Chan, J. Demmel, J.M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, H. Van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA, 1994. [2] T.S. Basu, H. Wang, A fast second-order finite difference method for space-fractional diffusion equations, Int’l J. Numer. Anal. Modeling 9 (2012) 658–666. [3] D. Benson, S.W. Wheatcraft, M.M. Meerschaert, The fractional-order governing equation of Lévy motion, Water Resour. Res. 36 (2000) 1413–1423. [4] S. Chen, F. Liu, ADI-Euler and extrapolation methods for the two-dimensional fractional advection–dispersion equation, J. Appl. Math. Comput. 26 (2008) 295–311. [5] P.J. Davis, Circulant Matrices, Wiley-Interscience, New York, 1979. [6] V.J. Ervin, N. Heuer, J.P. Roop, Numerical approximation of a time dependent, nonlinear, space-fractional diffusion equation, SIAM J. Numer. Anal. 45 (2007) 572–591. [7] V.J. Ervin, J.P. Roop, Variational formulation for the stationary fractional advection dispersion equation, Numer. Methods Partial Differential Equ. 22 (2005) 558–576. [8] V.J. Ervin, J.P. Roop, Variational solution of fractional advection dispersion equations on bounded domains in R d , Numer. Methods Partial Differential Equ. 23 (2007) 256–281. [9] R.M. Gray, Toeplitz and circulant matrices: A review, Found. Trends Commun. Inf. Theory 2 (3) (2006) 55–239. [10] T.A.M. Langlands, B.I. Henry, S.L. Wearne, Fractional cable equation models for anomalous electrodiffusion in nerve cells: Infinite domain solutions, J. Math. Biol. 59 (2009) 761–808. [11] X. Li, C. Xu, The existence and uniqueness of the weak solution of the space–time fractional diffusion equation and a spectral method approximation, Commun. Comput. Phys. 8 (2010) 1016–1051. [12] C. Li, F. Zeng, F. Liu, Spectral approximations to the fractional integral and derivative, Fract. Calc. Appl. Anal. 15 (2012) 383–406. [13] R. Lin, F. Liu, V. Anh, I. Turner, Stability and convergence of a new explicit finite-difference approximation for the variable-order nonlinear fractional diffusion equation, Appl. Math. Comput. 212 (2009) 435–445. [14] F. Liu, V. Anh, I. Turner, Numerical solution of the space fractional Fokker–Planck equation, J. Comput. Appl. Math. 166 (2004) 209–219. [15] Q. Liu, F. Liu, I. Turner, V. Anh, Numerical simulation for the three-dimensional seepage flow with fractional derivatives in porous media, IMA J. Appl. Math. 74 (2009) 201–229. [16] Q. Liu, F. Liu, I. Turner, V. Anh, Finite element approximation for the modified anomalous subdiffusion process, Appl. Math. Model. 35 (2011) 4103–4116. [17] F. Mainardi, Fractals and Fractional Calculus Continuum Mechanics, Springer-Verlag, 1997, pp. 291–348. [18] M.M. Meerschaert, H.P. Scheffler, C. Tadjeran, Finite difference methods for two-dimensional fractional dispersion equation, J. Comput. Phys. 211 (2006) 249–261. [19] M.M. Meerschaert, C. Tadjeran, Finite difference approximations for fractional advection–dispersion flow equations, J. Comput. Appl. Math. 172 (2004) 65–77. [20] M.M. Meerschaert, C. Tadjeran, Finite difference approximations for two-sided space-fractional partial differential equations, Appl. Numer. Math. 56 (2006) 80–90. [21] R. Metzler, J. Klafter, The random walk’s guide to anomalous diffusion: A fractional dynamics approach, Phys. Rep. 339 (2000) 1–77. [22] D.A. Murio, Implicit finite difference approximation for time fractional diffusion equations, Comput. Math. Appl. 56 (2008) 1138–1145. [23] K.B. Oldham, Fractional differential equations in electrochemistry, Adv. Engrg. Software 41 (2010) 9–12. [24] I. Podlubny, Fractional Differential Equations, Academic Press, New York, 1999. [25] M. Raberto, E. Scalas, F. Mainardi, Waiting-times and returns in high-frequency financial data: An empirical study, Physica 314 (2002) 749–755. [26] G. Ramos-Fernandez, J. Mateos, O. Miramontes, G. Cocho, H. Larralde, B. Ayala-Orozco, Lévy walk patterns in the foraging movements of spider monkeys (Ateles geoffroyi), Behav. Ecol. Sociobiol. 55 (2004) 223–230. [27] K. Ritchie, X.-Y. Shan, J. Kondo, K. Iwasawa, T. Fujiwara, A. Kusumi, Detection of non-Brownian diffusion in the cell membrane in single molecule tracking, Biophysical J. 88 (2005) 2266–2277. [28] J.P. Roop, Computational aspects of FEM approximation of fractional advection dispersion equations on bounded domains in R 2 , J. Comput. Appl. Math. 193 (2006) 243–268. [29] S.G. Samko, A.A. Kilbas, O.I. Marichev, Fractional Integrals and Derivatives: Theory and Applications, Gordon and Breach, Yverdon, 1993. [30] H. Scher, E.W. Montroll, Anomalous transit-time dispersion in amorphous solids, Phys. Rev. B 12 (1975) 2455–2477. [31] E. Sousa, Finite difference approximates for a fractional advection diffusion problem, J. Comput. Phys. 228 (2009) 4038–4054. [32] L. Su, W. Wang, H. Wang, A characteristic finite difference method for transient fractional advection–diffusion equations, Appl. Numer. Math. 61 (2011) 946–960. [33] R.S. Varga, Matrix Iterative Analysis, second edition, Springer-Verlag, Berlin, Heidelberg, 2000. [34] H. Wang, T. Basu, A fast finite difference method for two-dimensional space-fractional diffusion equations, SIAM J. Sci. Comput. 34 (2012) A2444–A2458. [35] H. Wang, N. Du, A superfast-preconditioned iterative method for steady-state space-fractional diffusion equations, J. Comput. Phys. 240 (2013) 49–57. [36] H. Wang, K. Wang, An O ( N log2 N ) alternating-direction finite difference method for two-dimensional fractional diffusion equations, J. Comput. Phys. 230 (2011) 7830–7839. [37] H. Wang, K. Wang, T. Sircar, A direct O ( N log2 N ) finite difference method for fractional diffusion equations, J. Comput. Phys. 229 (2010) 8095–8104. [38] K. Wang, H. Wang, A fast characteristic finite difference method for fractional advection–diffusion equations, Adv. Water Resour. 34 (2011) 810–816. [39] H. Wang, D. Yang, Wellposedness of variable-coefficient conservative fractional elliptic differential equations, SIAM J. Numer. Anal. 51 (2013) 1088–1107. [40] Q. Yu, F. Liu, I. Turner, K. Burrage, A computationally effective alternating direction method for the space and time fractional Bloch–Torrey equation in 3-D, Appl. Math. Comp. 219 (2012) 4082–4095.