A parallel CGS block-centered finite difference method for a nonlinear time-fractional parabolic equation

A parallel CGS block-centered finite difference method for a nonlinear time-fractional parabolic equation

Available online at www.sciencedirect.com ScienceDirect Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348 www.elsevier.com/locate/cma A parallel...

520KB Sizes 5 Downloads 129 Views

Available online at www.sciencedirect.com

ScienceDirect Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348 www.elsevier.com/locate/cma

A parallel CGS block-centered finite difference method for a nonlinear time-fractional parabolic equation Zhengguang Liu 1 , Xiaoli Li ∗,1 School of Mathematics, Shandong University, Jinan, Shandong 250100, China Received 14 December 2015; received in revised form 22 May 2016; accepted 24 May 2016 Available online 3 June 2016

Abstract In this article, a parallel Conjugate Gradient Squared (CGS) block-centered finite difference scheme is introduced and analyzed to cast about the numerical solution of a nonlinear time-fractional parabolic equation with the Neumann condition and a nonlinear reaction term. The unconditionally stable result, which just depends on initial value and source item, is derived. Some a priori estimates of discrete L 2 -norm with optimal order of convergence O(∆t 2−α +h 2 +k 2 ) with pressure and velocity are established on both uniform and non-uniform rectangular grids, where ∆t is the time step, h and k are maximal mesh sizes of x and y-directional grids. In our model, because the simulation is iterated over a series of time steps, it is most beneficial if all the calculation units in a time step can be run separately. In CGS algorithm, adding OpenMP instructions to the circulation calculations in an iterative step can realize parallel computing. To examine the efficiency and accuracy of the proposed method, numerical experiments using the schemes are studied. The results clearly show the benefit of using the proposed approach in terms of execution time reduction and speedup with respect to the sequential running in a single thread. c 2016 Elsevier B.V. All rights reserved. ⃝

Keywords: Block-centered finite difference; Time-fractional parabolic equation; Error estimates; CGS; OpenMP

1. Introduction In recent years, many problems in physical science, electromagnetism, electrochemistry, diffusion and general transport theory can be solved by the fractional calculus approach, which gives attractive applications as a new modeling tool in a variety of scientific and engineering fields. Roughly speaking, the fractional models can be classified into two principal kinds: space-fractional differential equation [1–6] and time-fractional one [7–16]. At present, there are a huge number of theoretical and applied works devoted to the study of fractional differential equations. Numerical methods and theory of solutions of the problems for fractional differential equations have been studied extensively by many researchers which mainly cover finite element methods [7–11], mixed finite element ∗ Corresponding author.

E-mail addresses: [email protected] (Z. Liu), [email protected] (X. Li). 1 The authors contributed equally to this work. http://dx.doi.org/10.1016/j.cma.2016.05.028 c 2016 Elsevier B.V. All rights reserved. 0045-7825/⃝

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

331

methods [17,18], finite difference methods [12–16], finite volume (element) methods [19], (local) discontinuous Galerkin (L)DG methods [20], spectral methods [21,22] and so on. This paper concerns with methods for obtaining accurate numerical approximations to the solutions of nonlinear time fractional parabolic equation. The time fractional parabolic partial differential equation is a type of second order partial differential equations (PDEs), describing a wide family of problems including heat diffusion and ocean acoustic propagation, in physical or mathematical systems with a time variable, which behave essentially like heat diffusing through a solid [23]. Significant progress has already been made in the approximation of the time fractional order dispersion equation, see for example [24]. Schumer [25] firstly develops the fractional-order, mobile/immobile (MIM) model. The time drift term ∂u/∂t is added to describe the motion time and thus helps to distinguish the status of particles conveniently. In most cases, it is difficult, or infeasible, to find the analytical solution or good numerical solution of the problems. In recent years, Some related methods have been considered by many authors. For example, numerical simulation of the fractional order mobile/immobile advection–dispersion model is considered by Liu et al. [26]. Furthermore, Zhang and Liu [27] present a novel numerical method for the time variable fractional order mobile–immobile advection–dispersion model. The finite difference schemes are used by Ashyralyev and Cakir [28] for solving one-dimensional fractional parabolic partial differential equations. They [29] also give the FDM for fractional parabolic equations with the Neumann condition. Block-centered finite differences, sometimes called cell-centered finite differences, can be thought as the lowest order Raviart–Thomas mixed element method [30], with proper quadrature formulation. In [31], Wheeler presents the mixed finite elements for elliptic problems with tensor coefficients as cell-centered finite differences. And in 2012, a block-centered finite difference method for the Darcy–Forchheimer model is considered in [32]. In [33–35] blockcentered finite difference methods are developed to solve linear and nonlinear equations. In this paper our target is to present the block-centered finite difference method to solve a nonlinear time-fractional parabolic equation with the Neumann condition and a nonlinear reaction term which has been presented in [36,29]. And specially, second-order error estimates in spacial mesh-size both for pressure and velocity in discrete L 2 norms are established on both uniform and non-uniform rectangular grids. The unconditionally stable result, which just depends on initial value and source item, is derived. Some a priori estimates of L 2 -norm with optimal order of convergence O(∆t 2−α + h 2 + k 2 ) with pressure and velocity are obtained, where ∆t is the time step, h and k are maximal mesh sizes of x and y-directional grids. In our model, because the simulation is iterated over a series of time steps, it is most beneficial if all the calculation units in a time step can be run separately. In Conjugate Gradient Squared (CGS) [37], adding OpenMP instructions [38,39] to the circulation calculations in an iterative step can realize parallel computing. Different vector multiply vector calculations are distributed to different threads. The efficiency reduces as the number of threads increases while all are higher than 68%. The paper is organized as follows. In Section 2, we give the problem, some notations and lemmas. In Section 3, we present the block-centered finite difference methods. Then in Section 4, we present the analysis of stability and error estimates for the presented methods. In Section 5, an OpenMP parallel Conjugate Gradient Squared (CGS) algorithm for solving nonsymmetric positive definite system which arises from the discrete methods is discussed. And some numerical experiments using the block-centered finite difference schemes are carried out. 2. The problem and some notations In this section, we first describe the problem of two dimensional nonlinear time-fractional parabolic equation (see [36,29]) with the Neumann condition and a nonlinear reaction term in this paper, and present some notations which will be found helpful in the following analysis. Find p = p(x, y, t) such that ∂p C α + D p − ∇ · (a(x, y, t)∇ p) = f ( p) + g(x, y, t), ∂t 0 t with Neumann boundary condition a(x, y, t)∇ p · n = 0,

(x, y, t) ∈ ∂Ω × J,

and initial condition p |t=0 = p0 (x, y),

(x, y) ∈ Ω .

(x, y, t) ∈ Ω × J,

332

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

Introduce u = (u x , u y ) = −a∇ p, then the nonlinear system of above equations can be transformed into the following formulation ∂p C α + D p+∇ ·u ∂t 0 t u u·n p |t=0

= f ( p) + g(x, y, t),

(x, y, t) ∈ Ω × J,

= −a(x, y, t)∇ p, (x, y, t) ∈ Ω × J, = 0, (x, y, t) ∈ ∂Ω × J, = p0 (x, y), (x, y) ∈ Ω ,

(1)

here Ω = (0,  1) × (0, 1), J x= (0,  T ],nx represents the unit exterior normal vector to the boundary of Ω , a ∂ ∂ = a (x, y, t) a y (x, y, t) and p0 (x, y) are given known functions. And there ∇ = ∂ x , ∂ y , a(x, y, t) = ay exist two positive constants β and γ such that 0 < β 6 ax 6 γ ,

0 < β 6 ay 6 γ.

(2)

α Here, we consider the case 0 < α < 1. C 0 Dt p in (1) is defined as the following Caputo fractional derivative of order α, given by [1]  t ∂ p(x, y, τ ) dτ 1 C α , 0 < α < 1. (3) D p(x, y, t) = 0 t Γ (1 − α) 0 ∂τ (t − τ )α

In particular, the nonlinear reaction term f ( p) satisfies the smoothness assumptions [7]: A1: | f ( p)| 6 C| p|, A2: The first-order derivative of f ( p) with respect to p is bounded, that is to say, there exists a positive constant C such that | f ′ ( p)| 6 C. Usually for compressible flow in porous media, p represents the pressure while u represents the velocity of the fluid, so we call p pressure and u velocity. Now, we define some notations. Let N > 0 be a positive integer. Set △t = T /N ;

tn = n△t,

for n ≤ N .

The two dimensional domain Ω = (0, 1) × (0, 1) is partitioned by δx × δ y , where δx : 0 = x 1 < x 3 < · · · < x Nx − 1 < x Nx + 1 = 1, 2

2

2

2

δ y : 0 = y 1 < y 3 < · · · < y N y − 1 < y N y + 1 = 1. 2

2

2

2

For i = 1, . . . , N x and j = 1, . . . , N y , define [32,33,35] xi =

xi− 1 + xi+ 1 2

h i = xi+ 1

2

2 , 2 − xi− 1 , 2

h = max h i , 1≤i≤N x

h i + h i+1 , h i+ 1 = xi+1 − xi = 2 2 y j− 1 + y j+ 1 2 2 yj = , 2 k j = y j+ 1 − y j− 1 , k = max k j , 2

k j+ 1

2

2

1≤ j≤N y

k j + k j+1 = y j+1 − y j = . 2

For functions f (x, y) and g(x, y), define the L 2 inner products and norms   ( f, g) = f (x, y)g(x, y)d xdy, ∥ f ∥ = ( f, f ). Ω

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

333

Let gi, j , gi+ 1 , j , gi, j+ 1 denote g(xi , y j ), g(xi+ 1 , y j ) and g(xi , y j+ 1 ). Define the discrete inner products and norms 2 2 2 2 as follows: ( f, g)m =

Ny Nx  

h i k j f i, j gi, j ,

i=1 j=1

( f, g)x =

Ny Nx  

h i− 1 k j f i− 1 , j gi− 1 , j , 2

i=2 j=1

( f, g) y =

Ny Nx  

2

2

h i k j− 1 f i, j− 1 gi, j− 1 , 2

i=1 j=2

∥ f ∥i2 = ( f, f )i ,

2

2

i = m, x, y.

Noting that for f , we have ∥ f ∥m = ∥ f ∥, then we define [dx g]i+ 1 , j = 2

gi+1, j − gi, j , h i+ 1 2

[d y g]i, j+ 1 2

gi, j+1 − gi, j = , k j+ 1 2

[Dx g]i, j = [D y g]i, j =

gi+ 1 , j − gi− 1 , j 2

2

gi, j+ 1 2

hi − gi, j− 1

2

kj

, ,

g n − g n−1 , △t 3g n+1 − 4g n + g n−1 [Dt g]n+1 = . 2△t [dt g]n =

Next we also give some lemmas which are used in stability analysis and error estimates. Lemma 1. If p is sufficiently smooth, then there holds    2  ∂ pi+1/2, j 1  2∂ p x   = [d p] − + ϵi+1/2, d h x i+1/2, j x  ∂x j ( p), 8 ∂ x 2 i+1/2, j   2  ∂ pi, j+1/2 1  y 2∂ p   = [d p] − d k + ϵi, j+1/2 ( p), y y i, j+1/2  ∂y 8 ∂ y 2 i, j+1/2 with the following approximate properties x 2 ϵi+1/2, j ( p) = O(h ),

y

ϵi, j+1/2 ( p) = O(k 2 ).

Here h i and k j are looked as discrete functions,     2p 2p 2 p  ∂ ∂ ∂ 1 i+1, j i, j 2 dx h 2 2 = h i+1 − h i2 , h i+1/2 ∂x ∂x2 ∂x2 i+1/2, j     2 2 2  1 2∂ p 2 ∂ pi, j+1 2 ∂ pi, j dy k = k j+1 − kj . k j+1/2 ∂ y 2 i, j+1/2 ∂ y2 ∂ y2

334

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

Lemma 2. If p is sufficiently smooth, then there holds   1 x x   u = −[dx ( p − δ)]i+1/2, j −  ϵi+1/2,  ax j ( p), i+1/2, j  1  y  = −[d y ( p − δ)]i, j+1/2 −  ϵi, j+1/2 ( p),  y uy a i, j+1/2 with the following approximate properties     y 2 2 x 2 2 ( p) = O h + k .  ϵi+1/2, ( p) = O h + k ,  ϵ j i, j+1/2 y

y

y

x x x Lemma 3. Let qi, j , wi+1/2, j and wi, j+1/2 be any values such that w1/2, j = w N x +1/2, j = wi,1/2 = wi,N y +1/2 = 0, then we have

(q, Dx w x )m = −(dx q, w x )x , (q, D y w y )m = −(d y q, w y ) y . These Lemmas 1–3 can be proven similar to the literature [33]. Throughout the paper we use C, with or without subscript, to denote a positive constant, which could have different values at different appearances. 3. Block-centered finite difference methods The objective of this section is to consider the block-centered finite difference methods for Eqs. (1). Firstly, for the convenience of theoretical analysis, we now denote G αk = (k + 1)1−α − k 1−α .

(4)

Then from the literature [21], it is not difficult to verify that 1 = G α0 > G α1 > G α2 > · · · > G αn > · · · → ∆t α → 0.

(5)

To give the fully discrete analysis based on the spatial semi-discrete system, we need to approximate both integer and fractional derivatives in time. For 0 < α < 1, by the simple calculation at t = tn+1 , we can get the equality easily. n ∂ α p(x, y, tn+1 ) ∆t 1−α  = G α [dt p]k+1 + E 0 , ∂t α Γ (2 − α) k=0 n−k

(6)

where n  1 E0 = Γ (1 − α) k=0

×



tk+1

tk



tk+1 + tk τ− 2

 ∂ 2 p(x, y, t

k+ 12 )

∂t 2

dτ . (tn+1 − τ )α

    + O (τ − tk+ 1 )2 + O ∆t 2



2

(7)

From [7], we get |E 0 | 6

C Γ (2 − α)



 2    ∂ p(x, y, t)  2−α 2  ∆t max  + T ∆t .  t∈[0,T ] ∂t 2

(8)

N N Now, denote by {Z n , W x,n , W y,n }n=1 the block-centered finite difference approximations to { p n , u x,n , u y,n }n=1 , respectively. Then the scheme is as follows,

335

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

Case I: n = 0 [dt Z ]i,1 j +

∆t 1−α [dt Z ]i,1 j + [Dx W x ]i,1 j + [D y W y ]i,1 j = f (Z i,0 j ) + gi,1 j , Γ (2 − α)

(9)

Case II: n > 1 [Dt Z ]i,n+1 j +

n ∆t 1−α  n−1 n+1 x n+1 y n+1 n G α [dt Z ]i,k+1 j + [Dx W ]i, j + [D y W ]i, j = 2 f (Z i, j ) − f (Z i, j ) + gi, j , (10) Γ (2 − α) k=0 n−k

and W x,n+1 = −[a x dx Z ]n+11 , 1 i+ 2 , j

y,n+1 i, j+ 21

W

i+ 2 , j

= −[a y d y Z ]n+1 1 , i, j+ 2

1 ≤ i ≤ N x − 1, 1 ≤ j ≤ N y ,

(11)

1 ≤ i ≤ N x , 1 ≤ j ≤ N y − 1.

(12)

Set the initial approximation as follows: Z i,0 j = p0i, j ,

1 ≤ i ≤ Nx , 1 ≤ j ≤ N y .

It is easy to see that at each time level, the difference scheme (9)–(12) is a linear pentadiagonal system with strictly diagonally dominant coefficient matrix, thus the approximate solution exists uniquely. 4. Stability and error analysis 4.1. The analysis of stability for discrete scheme In this subsection, we will give the analysis of stability for the case 0 < α < 1. Multiplying (10) by Z i,n+1 j h i k j and making summation on i, j for 1 ≤ i ≤ N x , 1 ≤ j ≤ N y , n > 1, we have 

n   ∆t 1−α  G αn−k dt Z k+1 , Z n+1 m m Γ (2 − α) k=0     + Dx W x,n+1 , Z n+1 + D y W y,n+1 , Z n+1 m m     = 2 f (Z n ) − f (Z n−1 ), Z n+1 + g n+1 , Z n+1 .

Dt Z n+1 , Z n+1



+

m

(13)

m

Firstly, we estimate the first term on the left hand side of Eq. (13).    2     2    1 2  n+1 2  n+1    − Zn −  Z n m + 2Z n − Z n−1  Dt Z n+1 , Z n+1 = Z  + 2Z m m m m 4∆t  2    + Z n+1 − 2Z n + Z n−1  . m

(14)

Then the second term on the left of Eq. (13) can be transformed as follows: n n     ∆t −α  ∆t 1−α  G αn−k dt Z k+1 , Z n+1 = G αk Z n−k+1 − Z n−k , Z n+1 m m Γ (2 − α) k=0 Γ (2 − α) k=0   n−1   2       ∆t −α   = G α0 Z n+1  − G αk − G αk+1 Z n−k , Z n+1 − G αn Z 0 , Z n+1 . m m m Γ (2 − α) k=0

(15)

Next we estimate the last two terms on the left of Eq. (13). Using Lemma 3, we obtain 

Dx W

x,n+1

,Z

n+1





m

=− W

x,n+1

, dx Z

n+1



 x

= W

x,n+1

,

1 a x,n+1

W

x,n+1

 x

 2   1 x,n+1   = √ W  , (16) x,n+1 a x

336

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348



Dy W

y,n+1

,Z

n+1



 m

=− W

y,n+1

, dy Z

n+1



 y

= W

y,n+1

,

1 a y,n+1

W

y,n+1

 y

 2   1 y,n+1   = √ W  . (17) y,n+1 a y

Then combining Eq. (13) with Eqs. (14)–(16) and multiplying both sides of Eq. (13) by 4∆t give that  2  2    2      2       n+1 2  n+1 + Z n+1 − 2Z n + Z n−1  −  Z n m + 2Z n − Z n−1  − Zn  + 2Z Z m m m m   n−1       4∆t 1−α 2  α  + (G k − G αk+1 ) Z n−k , Z n+1 − G αn Z 0 , Z n+1 G α0 Z n+1  − m m m Γ (2 − α) k=0  2  2     1 1 x,n+1  y,n+1   +  √ x,n+1 W  +  √ y,n+1 W  a a x y     = 4∆t 2 f (Z n ) − f (Z n−1 ), Z n+1 + 4∆t g n+1 , Z n+1 . m

m

(18)

In particularly, observing that G αk − G αk+1 > 0, the smoothness assumption A1, and using Cauchy–Schwarz inequality, we obtain    2     2  2  n+1 2  n+1    − Zn −  Z n m + 2Z n − Z n−1  Z  + 2Z m m m  2    1−α 2 4∆t    n+1  + Z n+1 − 2Z n + Z n−1  + Z  m m Γ (2 − α)   n−1      2 2∆t 1−α  α  n−k 2  n+1 2 α α  0 6 (G − G k+1 ) Z  + Z  + G n Z  m m m Γ (2 − α) k=0 k   2  2  2  2        (19) + C∆t  Z n m + Z n−1  + Z n+1  + g n+1  . m

m

m

Consequently, we can obtain    n−1   2     2∆t 1−α   n+1 2  n+1  n+1 2  α  n−k 2 n −Z  + G k+1 Z Z  + 2Z  Z  + m m m m Γ (2 − α) k=0  n−1 2   2   2  2∆t 1−α  2∆t 1−α α       2 + 6  Z n m + 2Z n − Z n−1  G αk Z n−k  + G n Z 0  m m m Γ (2 − α) k=0 Γ (2 − α)          2  2  2  2 + C∆t  Z n m + Z n−1  + Z n+1  + g n+1  . m

m

(20)

m

Furthermore, noting that n−1 n n      2   2  2   n+1 2  α  n−k 2       G k+1 Z G αk Z n+1−k  = G αk Z n+1−k  , Z  +  = G α0 Z n+1  + m

m

k=0

m

k=1

m

k=0

m

(21)

and let n−1  2 2  2  2∆t 1−α      G αk Z n−k  , Q(Z n ) =  Z n m + 2Z n − Z n−1  + m m Γ (2 − α) k=0

(22)

then we obtain Q(Z

n+1

         n 2  2∆t 1−α α   0 2  n−1 2  n+1 2  n+1 2   Z m + Z ) − Q(Z ) 6 G Z  + C∆t  + Z  + g  . m m m m Γ (2 − α) n n

(23)

337

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

Summing on n, n = 1, . . . , M − 1(M 6 N ) and noting ∥Z n ∥2m 6 Q(Z n ), we give Q(Z M ) 6 Q(Z 1 ) +

M−1  2  2   2∆t 1−α M−1     Q(Z n ) + C∆t Z 0  G αn Z 0  + C∆t m m Γ (2 − α) n=1 n=1

+ C∆t Q(Z M ) + C∆t

M−1  n=1

  n+1 2 g  .

(24)

m

We can choose ∆t such that 1 − C∆t > 0, applying Gronwall’s lemma gives that   M−1   2  2   2∆t 1−α M−1  n+1 2  0 M 1 α  0 Q(Z ) 6 C Q(Z ) + C G Z  + ∆t Z  + ∆t g  m m m Γ (2 − α) n=1 n n=1   M−1   2   2∆t 1−α  1−α  n+1 2   6 C Q(Z 1 ) + C M − 1 + ∆t Z 0  + C∆t g  m m Γ (2 − α) n=1 M−1  2      n+1 2 6 C Q(Z 1 ) + C Z 0  + C∆t g  . m

(25)

m

n=1

For the next procedure, we need to consider the estimation of Q(Z 1 ). Similar to the case n > 1, we can easily obtain that     2  2 1 2∆t 1−α   1 2     (26) 1+ Z  6 C Z 0  + C∆t g 1  , m m m 4 Γ (2 − α) then  2  2     Q(Z 1 ) 6 C Z 0  + C∆t g 1  . m

(27)

m

Combining the inequality (25) with (27), we get M−1 M    2       n+1 2 g n 2 . Q(Z M ) 6 C Z 0  + C∆t g  ≤ C ∥ p0 ∥2M + C∆t m M

m

n=1

(28)

n=1

Thus, we can get the stability for the scheme (9)–(12). Theorem 4. For the scheme (9)–(12), we have the following stable inequality Q(Z M ) 6 C ∥ p0 ∥2M + C∆t

M    g n 2 , m n=1

 2   t 1−α n−1 α  n−k 2 where Q(Z n ) = ∥Z n ∥2m + 2Z n − Z n−1 m + Γ2∆(2−α) . k=0 G k Z m 4.2. Optimal error results In this subsection, error estimates using the given schemes are obtained. Firstly, we quote the following result.    2 2 n 2 n   k 2j ∂ 2 pi,n j h i2 ∂ pi, j h ∂ p k2 ∂ 2 p n Lemma 5. Denote δi, j = 8 ∂ x 2 + 8 ∂ y 2 = 8 ∂ x 2 + 8 ∂ y 2 = O h 2 + k 2 , then i, j

Dt δi,n j =

3δi,n+1 j

− 4δi,n j

+ δi,n−1 j

2∆t

  = O h2 + k2 ,

and dt δi,n j = dt



h2 ∂ 2 p k2 ∂ 2 p + 8 ∂x2 8 ∂ y2

n = i, j

h i2 dt 8



∂2 p ∂x2

n + i, j

k 2j 8

 dt

∂2 p ∂ y2

n i, j

  = O h2 + k2 .

338

If

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348 ∂2 p ∂x2

∈ C 2 (0, T ), then we have

n ∆t 1−α  G α dt δ k+1 Γ (2 − α) k=0 n−k i, j    2 k+1   2 k+1  n n k 2j h i2 ∆t 1−α  ∂ p ∆t 1−α  ∂ p α α G dt G dt = + 8 Γ (2 − α) k=0 n−k 8 Γ (2 − α) k=0 n−k ∂ x 2 i, j ∂ y 2 i, j   = O h2 + k2 .

The proof of this lemma can be obtained easily. From Eq. (1), we have Case I: n = 0   4  ∆t 1−α 1 1+ [dt p]i,1 j + [Dx u x ]i,1 j + [D y u y ]i,1 j = f ( pi,0 j ) + gi,1 j + E k,i, j, Γ (2 − α) k=1

(29)

Case II: n > 1 [Dt p]i,n+1 j +

m ∆t 1−α  x n+1 y n+1 G α [dt p]i,k+1 j + [Dx u ]i, j + [D y u ]i, j Γ (2 − α) k=0 n−k

n+1 = 2 f ( pi,n j ) − f ( pi,n−1 j ) + gi, j +

6 

n+1 E k,i, j,

(30)

k=3

where 1 E 1,i, j

=

[dt p]i,1 j

∂p − ∂t 

1

,

i, j

1 1 0 E 2,i, j = f ( pi, j ) − f ( pi, j ), n  n+1 ∆t 1−α  k+1 n+1 α C α G [d p] − E 3,i, = D p , t 0 t i, j j i, j Γ (2 − α) k=0 n−k  y n+1  x n+1 ∂u ∂u n+1 y n+1 x n+1 + [D y u ]i, j − , E 4,i, j = [Dx u ]i, j − ∂ x i, j ∂ y i, j  n+1 ∂p n+1 n+1 , E 5,i, = [D p] − t i, j j ∂t i, j   n+1 n+1 n−1 n E 6,i, j = f ( pi, j ) − 2 f ( pi, j ) − f ( pi, j ) .

Subtracting Eq. (29) from (9) and Eq. (30) from (10), we obtain the error equations. Case I: n = 0   ∆t 1−α 1+ [dt ( p − Z )]i,1 j + [Dx (u x − W x )]i,1 j + [D y (u y − W y )]i,1 j Γ (2 − α) 4  1 = f ( pi,0 j ) − f (Z i,0 j ) + E k,i, j,

(31)

k=1

Case II: n > 1 [Dt ( p − Z )]i,n+1 j +

m ∆t 1−α  x x n+1 y y n+1 G α [dt ( p − Z )]i,k+1 j + [Dx (u − W )]i, j + [D y (u − W )]i, j Γ (2 − α) k=0 n−k

6    n−1 n+1 n = 2 f ( pi,n j ) − f ( pi,n−1 ) − 2 f (Z ) − f (Z ) + E k,i, i, j j i, j j. k=3

(32)

339

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

Then, in order to simplify the notations, we define ξi,n j = [ p − Z − δ]i,n j ,  x  x,n x n ηi+1/2, j = u − W i+1/2, j ,  n y,n ηi, j+1/2 = u y − W y i, j+1/2 , thus the error equations can be rewritten as follows: Case I: n = 0   ∆t 1−α 1+ [dt ξ ]i,1 j + [Dx η x ]i,1 j + [D y η y ]i,1 j Γ (2 − α) = f ( pi,0 j ) − f (Z i,0 j ) − dt δi,1 j −

4  ∆t 1−α 1 G α0 dt δi,1 j + E k,i, j, Γ (2 − α) k=1

(33)

Case II: n > 1 m ∆t 1−α  y n+1 x n+1 G α [dt ξ ]i,k+1 j + [Dx η ]i, j + [D y η ]i, j Γ (2 − α) k=0 n−k   n−1 n+1 n = 2 f ( pi,n j ) − f ( pi,n−1 j ) − 2 f (Z i, j ) − f (Z i, j ) − Dt δi, j

[Dt ξ ]i,n+1 j +



n 6  ∆t 1−α  n+1 G αn−k dt δi,k+1 + E k,i, j j. Γ (2 − α) k=0 k=3

(34)

Firstly, multiplying (34) by ξi,n+1 j h i k j and making summation on i, j for 1 ≤ i ≤ N x , 1 ≤ j ≤ N y , we have that 

Dt ξ n+1 , ξ n+1



+ m

n       ∆t 1−α  G αn−k dt ξ k+1 , ξ n+1 + Dx η x,n+1 , ξ n+1 + D y η y,n+1 , ξ n+1 m m m Γ (2 − α) k=0

6        n−1 n+1 n n+1 n+1 = 2 f ( pi,n j ) − f ( pi,n−1 ) − 2 f (Z ) − f (Z ) , ξ + E , ξ i, j j i, j k,i, j m

 − 

Dt δi,n+1 j +

= S1 , ξ n+1

 m

n ∆t 1−α  G α dt δ k+1 , ξ n+1 Γ (2 − α) k=0 n−k i, j



+ S2 , ξ n+1



+ m

6  

n+1 n+1 E k,i, j, ξ

k=3

k=3

m

 m

 m

.

We estimate the first term on the left hand side of Eq. (35).    2     2    1  n+1 2  n+1  n n n n−1   + Dt ξ n+1 , ξ n+1 = − ξ − ξ + − ξ ξ  2ξ  2ξ  m m m m m 4∆t   2   + ξ n+1 − 2ξ n + ξ n−1  . m

(35)

(36)

The second term on the left hand side of Eq. (35) can be transformed into the following: n n     ∆t −α  ∆t 1−α  G αn−k dt ξ k+1 , ξ n+1 = G αk ξ n−k+1 − ξ n−k , ξ n+1 m m Γ (2 − α) k=0 Γ (2 − α) k=0   n−1         n−k n+1  ∆t −α α n+1 n+1 α α α 0 n+1 = G0 ξ ,ξ − G k − G k+1 ξ ,ξ − Gn ξ , ξ . m m m Γ (2 − α) k=0

(37)

340

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

And taking notice of Lemmas 1–3, we can get that  

Dx η x,n+1 , ξ n+1 Dy η

y,n+1



n+1

 m

   = − η x,n+1 , dx ξ n+1 = η x,n+1 , x



 m

=− η

y,n+1

, dy ξ

n+1



 y

= η

y,n+1

,

1

a

η x,n+1 x,n+1 1

a

η y,n+1

y,n+1

 x

 y

  + η x,n+1 , ϵ x,n+1 ( p) ,

(38)

  + η y,n+1 , ϵ y,n+1 ( p) .

(39)

x

y

Next we estimate the terms on the right hand side of Eq. (35). First, using triangle inequality and the smoothness assumption A2, we obtain that 2     2            S1 , ξ n+1  6 C 2 f ′ (θ n ) p n − Z n − f ′ (θ n−1 ) p n−1 − Z n−1  + C ξ n+1  m m m  2  2   n 2      (40) 6 C ξ m + ξ n−1  + ξ n+1  . m

m

Recalling Lemma 5 and using Cauchy–Schwarz inequality, we have that   2          S2 , ξ n+1  6 C ξ n+1  + O h 4 + k 4 .

(41)

Furthermore, observing Eq. (6), we get that   2      n+1 n+1     E3 , ξ  6 C ξ n+1  + O ∆t 4−2α .

(42)

m

m

m

m

We use Taylor’s expansion and Cauchy–Schwarz inequality to get the estimates of the last three terms in the right hand side of Eq. (35).  2         n+1 n+1   6 C ξ n+1  + O h 4 + k 4 ,  E4 , ξ m m         n+1 n+1   n+1 2 (43)  E5 , ξ  6 C ξ  + O ∆t 4 , m m   2      n+1 n+1     E6 , ξ  6 C ξ n+1  + O ∆t 4 . m

m

Combining Eq. (35) with Eqs. (36)–(43) and noting G αk − G αk+1 > 0, we give    2     2   2  1  n+1 2  n+1  n  n+1 n n n−1  n n−1   −ξ  − ξ m + 2ξ − ξ + ξ − 2ξ + ξ ξ  + 2ξ   m m m m 4∆t       −α ∆t 1 1  2 G α0 ξ n+1  + η x,n+1 , x,n+1 η x,n+1 + η y,n+1 , y,n+1 η y,n+1 + m Γ (2 − α) a a x y   n−1       ∆t −α G αk − G αk+1 ξ n−k , ξ n+1 + G αn ξ 0 , ξ n+1 6 m m Γ (2 − α) k=0     − η x,n+1 , ϵ x,n+1 ( p) − η y,n+1 , ϵ y,n+1 ( p) x y  2  2     n 2      + C ξ m + ξ n−1  + ξ n+1  + O h 4 + k 4 + ∆t 4−2α m m   n n−1    2  2 −α   2 ∆t       G α ξ n−k  − G αk ξ n+1−k  + G αn ξ 0  6 m m m 2Γ (2 − α) k=0 k k=0   2  2  2      + C ξ n m + ξ n−1  + ξ n+1  m



+ O h 4 + k 4 + ∆t 4−2α



m

      x,n+1 2  y,n+1 2 + ε η  + η  , x

y

(44)

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

341

where we use the following two equalities n−1  k=0

n n 2 2   2  2            G αk ξ n+1−k  − ξ n+1  , G αk ξ n+1−k  = G αk+1 ξ n−k  = m

G αn +

n−1  

m

k=0

m

k=1

m

(45)

 G αk − G αk+1 = 1.

k=0

Multiplying both sides of inequality (44) by 4∆t, then summing over n, n = 1, 2, . . . , N − 1, we have that Q(ξ ) + C∆t N

N −1  

     x,n+1 2  y,n+1 2   + η η x

n=1

6 Q(ξ 1 ) + C

y

N −1 

−1  2  2   ∆t 1−α N     ∆t ξ n+1  + C G αn ξ 0  + C h 4 + k 4 + ∆t 4−2α m m Γ (2 − α) n=1 n=1

6 Q(ξ 1 ) + C∆t

N 

  Q(ξ n ) + C h 4 + k 4 + ∆t 4−2α .

(46)

n=0

For the next estimation, we have to give error analysis for Q(ξ 1 ). Multiplying (33) by ξi,1 j h i k j , making summation on i, j for 1 ≤ i ≤ N x , 1 ≤ j ≤ N y and using Cauchy–Schwarz inequality, we have that   2 2  2   ∆t 1−α   1      1 2 ξ  + ξ  + C∆t η x,1  + η y,1  m x y m Γ (2 − α)    2     1−α 2 ∆t  0   6 ξ 0  + ξ  + 2∆t f ( p 0 ) − f (Z 0 ), ξ 1 + O h 4 + k 4 + ∆t 4−2α m m m Γ (2 − α)  2     + ε ξ 1  + C∆t 2 ∥E 1 ∥2m + ∥E 2 ∥2m m

 2   2   1  2 ∆t 1−α     0 2     6 ξ 0  + ξ  + C ξ 0  + O h 4 + k 4 + ∆t 4−2α + ξ 1  . m m m m Γ (2 − α) 2  0  0  2  Note that ξ m = δ m = O h + k 2 , we have   2  2   2∆t 1−α   1 2     Q(ξ 1 ) = ξ 1  + 2ξ 1 − ξ 0  + ξ  6 O h 4 + k 4 + ∆t 4−2α . m m m Γ (2 − α)

(47)

(48)

By Eqs. (46) and (48), we have that Q(ξ N ) + C∆t

N −1  n=1

N       η x,n+1 2 + η y,n+1 2 6 C∆t  Q(ξ n ) + O h 4 + k 4 + ∆t 4−2α . x y n=0

Choosing ∆t sufficiently small and applying Gronwall’s inequality, we have Q(ξ N ) + C∆t

N −1   n=1

       x,n+1 2  y,n+1 2 η  + η  6 O h 4 + k 4 + ∆t 4−2α . x

y

(49)

Thus N −1  −1    2   2  2   2∆t 1−α N   x,n+1 2  y,n+1 2   N   G αk ξ N −k  + C∆t η  + η  ξ  + 2ξ N − ξ N −1  + x y m m m Γ (2 − α) k=0 n=1   6 O h 4 + k 4 + ∆t 4−2α .

(50)

342

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

Then we can deduce that −1 N 2   2   2∆t 1−α  2∆t 1−α N     G αk ξ N −k  = G αN −k ξ k  6 C h 4 + k 4 + ∆t 4−2α . m m Γ (2 − α) k=0 Γ (2 − α) k=1 Note that   G αN −k > G αN −1 = N 1−α − (N − 1)1−α = O ∆t α , and the estimation of δ, we have the following theorem. Theorem 6. Suppose the analytical solution is sufficiently smooth, then there exists a positive constant C independent of h, k and △t such that  1 l     2  y,n 2  2    l x,n x,n l y,n W − u  + W − u  ≤ C h 2 + k 2 + △t 2−α , ∀1 ≤ l ≤ N , Z − p  + C△t x

M

y

n=1

and 

l  2    ∆t (Z − p)k 

 21

m

k=1

  6 C h 2 + k 2 + ∆t 2−α , ∀1 ≤ l ≤ N .

5. Numerical tests 5.1. The CGS method The Conjugate Gradient Squared algorithm is one of the most effective tools for solving nonsymmetric positive definite system. Ax = b,

A ∈ Rn×n , b ∈ Rn .

(51)

Let x0 be a starting estimate of the solution x of the nonsingular linear system Ax = b, and let  r0 be suitably chosen. Then the CGS algorithm reads [37]: r0 := b − Ax0 ; q0 := p−1 := 0; θ−1 := 1; n := 0; while residual > tolerance do begin θn :=  r0T rn ; βn :=

θn θn−1

;

un := rn + βn qn ; pn := un + βn (qn + βn pn−1 ); vn := Apn ; θn ; σn := un − ωn vn ;

σn :=  r0T vn ; ωn := qn+1

rn+1 := rn − ωn A(un + qn+1 ); xn+1 := xn + ωn (un + qn+1 ); n := n + 1. end.

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

343

From above description, we know that an important advantage of the CGS algorithm nested loop, is not having synchronization requirements. The synchronization methods are computationally expensive and require special care during development and testing. So in CGS algorithm, adding OpenMP instructions to the circulation calculations in an iterative step can realize highly effective parallel computing. 5.2. Parallel model implementation Multi-core computing is a mature, relevant and extended technology in HPC computing. Although the use of OpenMP is relatively easy, it still requires some work to achieve an efficient use of computing resources. OpenMP provides constructs for sharing work among threads, e.g., to assign the iterations of a loop to individual threads. The directive provided by OpenMP to automatically parallelize a region comprised of a loop, is called parallel do, and causes the loop iterations to be distributed among the threads of the block. So the key to a parallel implementation based on shared memory is to determine the main loops that can be run separately in the model. OpenMP parallel instructions [38,39] can then be used to allocate these uncorrelated units to different processors, decreasing the calculation time. In our model, because the simulation is iterated over a series of time steps, it is most beneficial if all the calculation units in a time step can be run separately. Different vector multiply vector calculations are distributed to different threads by using OpenMP. In our OpenMP data model, shared variables are accessed by all threads, while private variables are copied separately for each thread, so each thread has its own version of the variable, thus avoiding concurrency issues or synchronization. For example, rn+1 := rn − αn A(un + qn+1 ) can be calculated using OpenMP as follows !$OMP PARALLEL D O REDUCTION(− : R) PRIVATE(J ) do j = 1, N r (:) = r (:) − ω A(:, j) (u( j) + q( j + 1)) end do !$OMP END PARALLEL D O 5.3. Some numerical examples In this subsection, the following example is carried out to test our theoretical results. The numerical results on uniform and nonuniform grids  difference α (0 < α < 1) are listed in Tables 1–3. In Tables 1 and 2, we choose  with 2 the optimal step size N = M 2−α , where M is the spatial partition and N is the temporal partition to obtain space convergence rate. Besides to  obtain  time convergence rate, we also give the errors and orders on nonuniform grids in 2−α Table 3 by choosing M = N 2 . Here the non-uniform grids are generated from the corresponding uniform grids by adding a random in both x and y directions (cf. Fig. 1). Example. Here the initial condition and the right hand side of the equation are computed according to the analytic solution given as below.  p(x, y, t) =  t cos(πx) cos(π y),    1 0 a(x, y, t) = , 0 1    u(x, y, t) = (−π t sin(π x) cos(π y), −π t cos(π x) sin(π y))T . Set T = 1 and the nonlinear term f ( p) = p − p 3 , then we get the forcing function  1−α  t 2 + (2π − 1)t + 1 cos(π x) cos(π y) + (t cos(π x) cos(π y))3 . g(x, y, t) = Γ (2 − α) In order to present the numerical method vividly, we give the scalar and flux solution figures of the example. Here, just as observed that u x (x, y, t) = u y (y, x, t), thus only the exact and numerical solutions of p and u x are presented. They are shown in Figs. 2 and 3 at T = 1 with the mesh size h = k = 1/20. From Tables 1–3 and Fig. 4, we can

344

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

Table 1 Errors and convergence rates for p and u x with uniform mesh. α

M×M

∥Z − p∥0,∞

Rate

∥W x − u x ∥0,2

Rate

∥Z − p∥0,2

Rate

1/2

8×8 27 × 27 64 × 64

5.9538E−3 5.0672E−4 8.9305E−5

– 2.02 2.01

5.0756E−3 4.0526E−4 6.9952E−5

– 2.01 2.03

3.5529E−3 2.9242E−4 5.1218E−5

– 2.05 2.01

2/3

9×9 16 × 16 25 × 25 36 × 36 49 × 49

4.5767E−3 1.4301E−3 5.8265E−4 2.8021E−4 1.5101E−4

– 2.02 2.01 2.01 2.02

3.6588E−3 1.1137E−3 4.4857E−4 2.1447E−4 1.1520E−4

– 2.06 2.03 2.02 2.02

2.6727E−3 8.2262E−4 3.3332E−4 1.5991E−4 8.6073E−5

– 2.04 2.02 2.01 2.01

Table 2 Errors and convergence rates for p and u x with nonuniform mesh. α 1/2

2/3

M×M

∥Z − p∥0,∞

Rate

∥W x − u x ∥0,2

Rate

∥Z − p∥0,2

Rate

10 × 10 20 × 20 40 × 40 80 × 80

3.8186E−3 9.5954E−4 2.3809E−4 5.8632E−5

– 2.01 2.03 2.02

3.5982E−3 8.0575E−4 1.9754E−4 4.7706E−5

– 2.18 2.04 2.04

2.2615E−3 5.5711E−4 1.3707E−4 3.3623E−5

– 2.04 2.04 2.02

10 × 10 20 × 20 40 × 40 80 × 80

3.7269E−3 9.4134E−4 2.3476E−4 5.8039E−5

– 2.00 2.02 2.01

3.3446E−3 7.5611E−4 1.8805E−4 4.7713E−5

– 2.16 2.02 1.98

2.1733E−3 5.4053E−4 1.3404E−4 3.3066E−5

– 2.02 2.03 2.01

Table 3 Errors and convergence rates for p and u x with nonuniform space mesh. α

∆t

∥Z − p∥0,∞

Rate

∥W x − u x ∥0,2

Rate

∥Z − p∥0,2

Rate

0.1

1/10 1/20 1/40 1/80 1/160

1.8232E−3 4.9849E−4 1.2836E−4 3.3774E−5 8.9311E−6

– 1.87 1.96 1.93 1.92

2.8657E−3 8.3394E−4 2.3085E−4 6.2849E−5 1.6688E−5

– 1.78 1.85 1.88 1.91

1.2258E−3 3.4486E−4 9.2689E−5 2.4963E−5 6.6358E−6

– 1.83 1.90 1.89 1.91

0.5

1/10 1/20 1/40 1/80 1/160

3.7092E−3 1.2000E−3 4.0766E−4 1.4225E−4 4.8440E−5

– 1.62 1.55 1.51 1.55

4.1431E−3 1.2783E−3 4.0516E−4 1.3039E−4 4.2901E−5

– 1.69 1.65 1.63 1.60

2.2769E−3 7.2117E−4 2.4074E−4 8.2887E−5 2.7994E−5

– 1.65 1.58 1.53 1.56

0.9

1/10 1/20 1/40 1/80 1/160

8.5549E−3 3.9235E−3 1.6345E−3 7.7460E−4 3.4379E−4

– 1.12 1.26 1.07 1.17

7.7025E−3 3.1582E−3 1.3882E−3 5.9993E−4 2.6885E−4

– 1.28 1.18 1.21 1.15

5.1417E−3 2.2906E−3 9.3803E−4 4.4092E−4 1.9480E−4

– 1.16 1.28 1.08 1.17

  see that the block-centered finite difference approximations for p and u x have the O ∆t 2−α + h 2 + k 2 accuracy in discrete L 2 -norms. There are two important performance evaluation indices for the parallel model: speedup, time and efficiency. The speedup formula is as follows: Sn = T /Tn , and the efficiency formula is as follows: E n = Sn /n,

345

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

Fig. 1. The nonuniform grid with 16 × 16 cells. Surface for numerical solution

Surface for exact solution

0.8

0.8 1

0.6

1

0.6

0.5

0.4

0.5

0.4 0.2

0.2 0

0

-0.5 -1 1

-0.2 -0.4 0.8

1 0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

-0.6

0

0

-0.5 -1 1

-0.8

-0.2 -0.4 0.8

1 0.6

0.4

0.2

0

0

0.2

0.4

0.6

0.8

-0.6 -0.8

Fig. 2. Surface for exact solution p (left) and numerical solution Z (right). Surface for numerical solution Wx

Surface for exact solution ux

3

3 4

2

4

2

2

1

2

1

0

0

-2

0

0

-2 -1

-4 1 0.8

0.6

0.4

0.2

0

0

0.2

0.4

0.6

0.8

1 -2 -3

-1 -4 1 0.8

0.6

0.4

0.2

0

0

0.2

0.4

0.6

0.8

1 -2 -3

Fig. 3. Surface for exact solution u x (left) and numerical solution W x (right).

where T is the serial computing time, Tn is the computation time with n parallel threads and n is the number of parallel threads. The computation resources available for this study are one node with multi-core Intel Xeon CPUs. From Table 4 and Fig. 5, we know that the parallel model based on OpenMP shared memory shows a good performance in CGS

346

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348 10-2

10-2 ||Z-p||0,∞

||Z-p||0,∞

||W-ux||0,2

||W-ux||0,2

||Z-p||0,2

||Z-p||0,2

log10Error

log10Error

10-3

10-4

10-3

10-4

10-5 10-2

10-1 log10h

10

100

-5

10-2

10-1 log10h

100

Fig. 4. Convergence rates for p and u x with uniform mesh (left) and nonuniform mesh (right). x104

4.5

N=64x64 4

N=81x81 N=100x100

3.5

Runtime (s)

3 2.5 2 1.5 1 0.5 0

1

2

3

4

5

6

7

8

Thread number

Fig. 5. Parallel CGS execution time for different problem sizes with different number of threads at α = 1/2. Table 4 T = Time (s), S = speedup and E = efficiency (%) of different calculation schemes. Scheme

Grid number

1

64 × 64

2

81 × 81

3

100 × 100

T S E T S E T S E

Thread count 1

2

4

8

3,568 – – 13,176 – – 42,641 – –

1,892 1.93 96.6 6891 1.92 95.6 22,948 1.86 92.9

1,069 3.33 83.4 3932 3.35 83.7 11,840 3.60 90.0

653 5.46 68.3 2243 5.87 73.4 6915 6.16 77.1

algorithm. We employ 1, 2, 4 and 8 threads to execute the program. As shown in Table 4, for the same grid number, the computation time reduces as the number of threads increases. For example, Scheme 3 (10,000 grid cells), takes

347

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348 1 N=64x64 N=81x81

0.95

N=100x100

Efficiency

0.9

0.85

0.8

0.75

0.7

1

2

3

4

5

6

7

8

Thread Number

Fig. 6. Parallel CGS Efficiency for different problem sizes with different number of threads at α = 1/2.

42,641 s with the serial scheme, but requires only 11,840 s with four threads, a time saving of about 72.2%. For 8 threads, only 6915 s are needed, some 83.7% less than with the serial model. The parallel model therefore plays an important role in accelerating the model operation period. Furthermore, the speedup factor increases with the number of threads and the efficiency reduces as the number of threads increases. For example in Scheme 2, speedup factors of 1.92, 3.35, and 5.87 are achieved with 2, 4 and 8 threads, respectively, demonstrating great extendibility. And the efficiency drops from 93.7% with two threads to just 48.8% with 8 threads, which indicates that the overall model performance is deteriorating. As shown in Fig. 6, we also note that for a fixed number of threads, the efficiency increases with the number of grid cells. The maximum time saving occurs when moving from a serial scheme to two threads, and the computation time reduces gradually as the number of threads increases. 6. Conclusions We find that there are no articles on block-centered difference method for time fractional partial equations with nonlinear terms. In this paper our target is to present the block-centered finite difference method to solve a nonlinear time-fractional parabolic equation with the Neumann condition and a nonlinear reaction term. And second-order error estimates in spacial meshsize both for pressure and velocity in discrete L 2 norms are established on both uniform and non-uniform rectangular grids. The unconditionally stable result, which just depends on initial value and source item, is derived. In our model, because the simulation is iterated over a series of time steps, it is most beneficial if all the calculation units in a time step can be run separately. Then we use OpenMP parallel Conjugate Gradient Squared (CGS) algorithm to solve Ax = b. In CGS, adding OpenMP instructions to the circulation calculations in an iterative step can realize parallel computing. Different vector multiply vector calculations are distributed to different threads. The efficiency reduces as the number of threads increases while all are higher than 68%. Acknowledgments The authors contributed equally to this work. They would like to thank the editor and referees for their valuable comments and suggestions which helped them to improve the results of this paper. References [1] I. Podlubny, Fractional Differential Equations: An Introduction to Fractional Derivatives, Fractional Differential Equations, to Methods of their Solution and Some of their Applications, Vol. 198, Academic press, 1998. [2] V.J. Ervin, N. Heuer, J.P. Roop, Numerical approximation of a time dependent, nonlinear, space-fractional diffusion equation, SIAM J. Numer. Anal. 45 (2) (2007) 572–591.

348

Z. Liu, X. Li / Comput. Methods Appl. Mech. Engrg. 308 (2016) 330–348

[3] F. Liu, V. Anh, I. Turner, Numerical solution of the space fractional Fokker–Planck equation, J. Comput. Appl. Math. 166 (1) (2004) 209–219. [4] A. Saadatmandi, M. Dehghan, A tau approach for solution of the space fractional diffusion equation, Comput. Math. Appl. 62 (3) (2011) 1135–1142. [5] N.H. Sweilam, M.M. Khader, A. Nagy, Numerical solution of two-sided space-fractional wave equation using finite difference method, J. Comput. Appl. Math. 235 (8) (2011) 2832–2841. [6] W. Tian, H. Zhou, W. Deng, A class of second order difference approximations for solving space fractional diffusion equations, Math. Comp. 84 (294) (2015) 1703–1727. [7] Y. Liu, Y. Du, H. Li, S. He, W. Gao, Finite difference/finite element method for a nonlinear time-fractional fourth-order reaction–diffusion problem, Comput. Math. Appl. 70 (4) (2015) 573–591. [8] N. Zhang, W. Deng, Y. Wu, Finite difference/element method for a two-dimensional modified fractional diffusion equation, Adv. Appl. Math. Mech. 4 (2012) 496–518. [9] Y. Jiang, J. Ma, High-order finite element methods for time-fractional partial differential equations, J. Comput. Appl. Math. 235 (11) (2011) 3285–3290. [10] W. Li, X. Da, Finite central difference/finite element approximations for parabolic integro-differential equations, Computing 90 (3–4) (2010) 89–111. [11] F. Zeng, C. Li, F. Liu, I. Turner, The use of finite difference/element approaches for solving the time-fractional subdiffusion equation, SIAM J. Sci. Comput. 35 (6) (2013) A2976–A3000. [12] E. Sousa, A second order explicit finite difference method for the fractional advection diffusion equation, Comput. Math. Appl. 64 (10) (2012) 3141–3152. [13] E. Sousa, An explicit high order method for fractional advection diffusion equations, J. Comput. Phys. 278 (2014) 257–274. [14] E. Sousa, Finite difference approximations for a fractional advection diffusion problem, J. Comput. Phys. 228 (11) (2009) 4038–4054. [15] J. Huang, Y. Tang, L. V´azquez, J. Yang, Two finite difference schemes for time fractional diffusion-wave equation, Numer. Algorithms 64 (4) (2013) 707–720. [16] Z. S.W. Vong, A high order compact finite difference scheme for time for fractional Fokker–Planck equations, Appl. Math. Lett. 43 (2015) 38–43. [17] Y. Liu, H. Li, W. Gao, S. He, Z. Fang, A new mixed element method for a class of time-fractional partial differential equations, Sci. World J. (2014). [18] Y. Liu, Z. Fang, H. Li, S. He, A mixed finite element method for a time-fractional fourth-order partial differential equation, Appl. Math. Comput. 243 (2014) 703–717. [19] F. Liu, P. Zhuang, I. Turner, K. Burrage, V. Anh, A new fractional finite volume method for solving the fractional diffusion equation, Appl. Math. Model. 38 (15) (2014) 3871–3878. [20] L. Wei, Y. He, Analysis of a fully discrete local discontinuous galerkin method for time-fractional fourth-order problems, Appl. Math. Model. 38 (4) (2014) 1511–1522. [21] Y. Lin, C. Xu, Finite difference/spectral approximations for the time-fractional diffusion equation, J. Comput. Phys. 225 (2) (2007) 1533–1552. [22] Y. Lin, X. Li, C. Xu, Finite difference/spectral approximations for the fractional cable equation, Math. Comp. 80 (275) (2011) 1369–1396. [23] A. Atangana, D. Baleanu, Numerical solution of a kind of fractional parabolic equations via two difference schemes, in: Abstract and Applied Analysis, vol. 2013, Hindawi Publishing Corporation, 2013. [24] Y. Zhang, D.A. Benson, D.M. Reeves, Time and space nonlocalities underlying fractional-derivative models: Distinction and literature review of field applications, Adv. Water Resour. 32 (4) (2009) 561–581. [25] R. Schumer, D.A. Benson, M.M. Meerschaert, B. Baeumer, Fractal mobile/immobile solute transport, Water Resour. Res. 39 (10). [26] F. Liu, P. Zhuang, K. Burrage, Numerical methods and analysis for a class of fractional advection–dispersion models, Comput. Math. Appl. 64 (10) (2012) 2990–3007. [27] H. Zhang, F. Liu, M.S. Phanikumar, M.M. Meerschaert, A novel numerical method for the time variable fractional order mobile–immobile advection–dispersion model, Comput. Math. Appl. 66 (5) (2013) 693–701. [28] A. Ashyralyev, Z. Cakir, On the numerical solution of fractional parabolic partial differential equations with the Dirichlet condition, Discrete Dyn. Nat. Soc. (2012). [29] A. Ashyralyev, Z. Cakir, Fdm for fractional parabolic equations with the Neumann condition, Adv. Difference Equ. 2013 (1) (2013) 1–16. [30] P.-A. Raviart, J.-M. Thomas, A mixed finite element method for 2nd order elliptic problems, in: Mathematical Aspects of Finite Element Methods, Springer, 1977, pp. 292–315. [31] T. Arbogast, M.F. Wheeler, I. Yotov, Mixed finite elements for elliptic problems with tensor coefficients as cell-centered finite differences, SIAM J. Numer. Anal. 34 (2) (1997) 828–852. [32] H. Rui, H. Pan, A block-centered finite difference method for the Darcy–Forchheimer model, SIAM J. Numer. Anal. 50 (5) (2012) 2612–2631. [33] H. Rui, H. Pan, Block-centered finite difference methods for parabolic equation with time-dependent coefficient, Jpn. J. Ind. Appl. Math. 30 (3) (2013) 681–699. [34] X. Li, H. Rui, A two-grid block-centered finite difference method for nonlinear non-fickian flow model, Appl. Math. Comput. 281 (2016) 300–313. [35] X. Li, H. Rui, Characteristic block-centred finite difference methods for nonlinear convection-dominated diffusion equation, Int. J. Comput. Math. (2015) 1–19. [36] A. Ashyralyev, Well-posedness of fractional parabolic equations, Bound. Value Probl. 2013 (1) (2013) 1–18. [37] P. Sonneveld, Cgs, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM J. Sci. Statist. Comput. 10 (1) (1989) 36–52. [38] S. Zhang, Z. Xia, R. Yuan, X. Jiang, Parallel computation of a dam-break flow model using openmp on a multi-core computer, J. Hydrol. 512 (2014) 126–133. [39] N. Sweilam, H. Moharram, N. Moniem, S. Ahmed, A. parallel, Crank–Nicolson finite difference method for time-fractional parabolic equation, J. Numer. Math. 22 (4) (2014) 363–382.