A parallel DSA algorithm in the 3-D cylindrical geometry

A parallel DSA algorithm in the 3-D cylindrical geometry

Annals of Nuclear Energy xxx (xxxx) xxx Contents lists available at ScienceDirect Annals of Nuclear Energy journal homepage: www.elsevier.com/locate...

2MB Sizes 1 Downloads 28 Views

Annals of Nuclear Energy xxx (xxxx) xxx

Contents lists available at ScienceDirect

Annals of Nuclear Energy journal homepage: www.elsevier.com/locate/anucene

A parallel DSA algorithm in the 3-D cylindrical geometry Longfei Xu a, Huayun Shen a,⇑, Junxia Wei a, Liangzhi Cao b, Youqi Zheng b a b

Institute of Applied Physics and Computational Mathematics, Beijing, China Xi’an Jiaotong University, Xi’an, Shaanxi, China

a r t i c l e

i n f o

Article history: Received 6 June 2019 Received in revised form 10 October 2019 Accepted 29 November 2019 Available online xxxx Keywords: DSA KBA Discrete-Ordinates method Neutron transport

a b s t r a c t A parallel DSA algorithm based on the red-black sweep PCG method is proposed to accelerate the neutron transport calculation in the 3-D cylindrical geometry. This parallel acceleration technique shares the same spatial domain decomposition strategy with the KBA method. The red-black sweep iteration is formed by moving the r- and h-axial components of the diffusion operator into the right hand side of the diffusion equation. Using the initial values refreshed by several red-black sweep iterations simplifies the matrix-vector multiplication in the CG procedure. Through the homogeneous problems, how the times of the red-black sweep effects the DSA algorithm is analyzed. A wide range of problems including FBR benchmark, PWR Pin-by-pin calculation and PWR shielding calculation have been performed to test the effectiveness and efficiency of the new method. The solution time is reduced by 3 ~ 10 times in different problems. This new acceleration algorithm has well parallel scalability. Ó 2019 Elsevier Ltd. All rights reserved.

1. Introduction Techniques for solving the NTE (neutron transport equation) are of great importance in nuclear science and engineering. However, the precise simulation of the 3-D NTE requires vast majority of computational resources, in terms of the memory and float point operations (Davidson et al., 2014). In the past decades, most researchers focused on the acceleration technique to improve the calculation efficiency (Adams and Larsen, 2002). With the development of high performance cluster, parallelization has become another useful way to improve the calculation efficiency. Therefore, how to match the parallelization and acceleration effectively is worthy of study. The Discrete-Ordinates method (SN) has been a popular technique for solving neutron transport equation due to its excellent efficiency and accuracy during the past decades (Carlson, 1953; Rhoades and Childs, 1988; Rhoades and Simpson, 1997; Alcouffe et al., 2005a,b; Evans et al., 2010). The DSA algorithm has become one of the most commonly used methods to accelerate the iteration procedure of the SN equation since it was proposed by Gelbard and Hageman (Gelbard and Hageman, 1969). This synthetic method was originally seen to be unstable for problems where the cell thickness exceeded roughly a mean free path (Reed, 1971). Based on the diamond difference scheme, Alcouffe showed that this instability could be eliminated if the spatial discretization ⇑ Corresponding author. E-mail address: [email protected] (H. Shen).

of the diffusion equation was chosen to be consistent with the spatial discretization of the SN equation (Alcouffe, 1977). The detailed derivation of this consistent DSA method in the 2-D Cartesian geometry was given by Alcouffe, after which it was generalized into the 3-D Cartesian geometry both for difference method (Bando et al., 1985) and nodal method (Azmy and Dorning, 1985). Soon afterward, Alcouffe’s ideas were extended to nondiamond schemes (Larsen, 1982; Warsa et al., 2001) and highorder diamond scheme (Martin and Hébert, 2009). The DSA method to accelerate the higher-order flux moment has also been well studied since 1982 (Morel, 1982; Khattab and Larsen, 1991; Mansur et al., 2014). Most of the past researches focus on the Cartesian geometry or low-dimensional curvilinear coordinate systems. However the cylindrical meshes are widely used in reactor physical calculations due to the rotational shapes of the core barrel, pressure vessel and biological shielding area etc. The angular redistribution in the streaming operator makes the transport equation more complex than that in the Cartesian geometry. Therefore, making a study of the DSA algorithm in the high-dimensional curvilinear geometry is necessary. In this paper, we give the detailed derivation of the DSA method in the 3-D cylindrical geometry. It will provide useful reference to other researchers. With the development of the high performance cluster, the parallel transport calculation has become more and more popular in the nuclear community (Azmy, 1992). PENTRAN (Sjoden and Haghighat, 1997), PARTISN (Alcouffe et al., 2005a,b), DENOVO (Evans et al., 2010), PDT (Bailey et al., 2011), DOMINO (Courau et al., 2013), JSNT (Cheng et al., 2017a,b), NECP-Hydra (Xu et al.,

https://doi.org/10.1016/j.anucene.2019.107236 0306-4549/Ó 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

2

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

2017), ARES (Cheng et al., 2017a,b) etc. are all the parallel SN codes based on regular grids. The SN spatial sweep is a complete boundary-to-boundary sweep of all directions. These upstreamdownstream dependencies make the SN spatial sweep difficult to be parallelized. The KBA algorithm (Koch et al., 1991) keeping the wave-front sweep nature along each discrete direction is found to be one of most effective parallel methods for SN solver on structured grids. This nature ensures the KBA algorithm holds the same convergence as the number of processors increases. Therefore, the KBA algorithm is an ideal choice to perform the massive transport calculation on nowadays’ super computer. Based on the frame of the KBA algorithm, Evans, Davidson and Slaybaugh have done lots of studies on the parallel acceleration techniques in their code DENOVO such as parallel GMRES, DSA-preconditioned GMRES, multi-group GMRES and so on (Evans et al., 2010; Davidson et al., 2014). These Krylov sub-space based methods are very powerful when the spatial difference scheme is linear. However, the positivity and accuracy are the double-edges swords for spatial difference schemes (Lathrop, 1969). The linear spatial difference schemes hardly guarantee the positivity and accuracy at the same time. In realistic applications, the non-linear spatial difference schemes such as theta-weighted diamond difference (Rhoades and Engle, 1977), diamond difference with set-to-zero fixup etc. are widely employed. And the DSA algorithm can still be used to accelerate the SN equation with these non-linear spatial difference schemes (Roberts and Anistratov, 2007). However, to our konwledge, there are few researches focusing on how to combine the KBA and DSA algorithms effectively. In this paper, the DSA method is implemented in the parallel frame of the KBA algorithm. Solving the diffusion equation efficiently is of great importance for the DSA algorithm, otherwise an effective-but-inefficient DSA method may be obtained. The CG (Conjugate Gradient) method is often applied to solve the diffusion equation. Preconditioning plays a crucial role in the design of practical CG method. Moreover, the choices of the initial fluxes do affect the number of iterations required to converge to a given tolerance. In the standard CG procedure, the matrix-vector multiplication is the most time-consuming part. Simplifying the matrix-vector multiplication will help to improve the efficiency of the diffusion solver. The above three aspects are concerned in this paper to form an efficient diffusion solver. In this paper, a study of an efficient parallel DSA algorithm matching the KBA method has been carried out in the 3-D cylindrical coordinate system. The rest of this paper is organized as follows: Section 2 gives the detailed derivation of the DSA method in the 3-D cylindrical geometry. In Section 3, the standard KBA algorithm is briefly described. The red-black sweep preconditioned CG method is presented in Section 4. The numerical analysis of the parallel DSA method is given in Section 5. Conclusions are summarized in Section 6.

Fig. 1. The 3-D cylindrical coordinate system.

Integrating each term of Eq. (1) by the operator RRR 2prðÞdrdhdz in a control volume shown in Fig. 2 yields the neutron balance equation h

i

h

l Aiþ1=2;j;k wiþ1=2;j;k  Ai1=2;j;k wi1=2;j;k þ gBi;k wi;jþ1=2;k  wi;j1=2;k

i

h i h i nC i;j wi;j;kþ1=2  wi;j;k1=2  V i;j;k r1 @ ð@gxwÞ þ V i;j;k Rt wi;j;k ¼ V i;j;k 41p ðRs / þ Q Þi;j;k i

ð2Þ

where the A, B and C are the surface areas with respect to the r-, hand z-axes. V is the volume of the cell. In deriving Eq. (2), we have used the following approximation

Z Z Z

2pr

Dzk Dhj Dr i

  1 @ ðgwÞ 1 @ ðgwÞ drdhdz  V i;j;k r @x ri @ x

ð3Þ

2. The DSA algorithm in 3-D cylindrical geometry In this section, the discrete form of the DSA equation is derived in the R-h-Z geometry based on the work of (Alcouffe, 1977; Bando et al., 1985). The form of the DSA equation in the X-Y-Z geometry can be regarded as the special case in the R-h-Z geometry. So, all the conclusions can be generalized to the X-Y-Z geometry by removing the angular redistribution term. The steady-state NTE in the 3-D cylindrical geometry (Fig. 1) is

l @ ðrwÞ g @w r

@r

þ

r @h

þn

@w 1 @ ðgwÞ 1 þ Rt w ¼ ðRs / þ Q Þ  @z r @ x 4p

ð1Þ

where Q contains the fission and external sources. We have assumed isotropic scattering and sources for simplicity. The DSA algorithm also applies to the more general anisotropic problems.

Fig. 2. The control volume cell in the 3-D cylindrical geometry.

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

3

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

We now seek the form of the source-corrected DSA in the R-h-Z geometry such that the leakage correction is compatible with the Eq. (2). Firstly, we define the angular flux moments as

Z

/nm ¼

Y nm wdX

ð4Þ

where Y nm are the normalized spherical harmonics. The first nine items of Y nm are given in Table 1. Integrating each term of Eq. (2) over all directions yields the P1 form equation in cell (i, j, k)

h

i Aiþ1=2;j;k /01;iþ1=2;j;k  Ai1=2;j;k /01;i1=2;j;k h i h i 1 þ Bi;j;k /11;i;jþ1=2;k  /11;i;j1=2;k þ C i;j;k /1 1;i;j;kþ1=2  /1;i;j;k1=2 þ V i;j;k RR;i;j;k /i;j;k ¼ V i;j;k Q i;j;k

ð5Þ

where RR is the macro removal cross-section

ð6Þ

It should be noted that the integral value of the angular redistribution over all directions is equal to zero

4p

 @ ðgwÞ dX ¼ 0 @x

ð7Þ

Adding up the similar equations in cells (i + 1, j, k), (i, j + 1, k), (i + 1, j + 1, k), (i, j, k + 1), (i + 1, j, k + 1), (i, j + 1, k + 1) and (i + 1, j + 1, k + 1), we obtain h i        A/01 iþ1;j;k þ A/01 iþ1;jþ1;k þ A/01 iþ1;j;kþ1 þ A/01 iþ1;jþ1;kþ1 h i         14 A/01 i;j;k þ A/01 i;j;kþ1 þ A/01 i;jþ1;k þ A/01 i;jþ1;kþ1 h i þ 14 Bi;jþ1;k /11;i;jþ1;k þ Biþ1;jþ1;k /11;iþ1;jþ1;k þ Bi;jþ1;kþ1 /11;i;jþ1;kþ1 þ Biþ1;jþ1;kþ1 /11;iþ1;jþ1;kþ1 h i  14 Bi;j;k /11;i;j;k þ Biþ1;j;k /11;iþ1;j;k þ Bi;j;kþ1 /11;i;j;kþ1 þ Biþ1;j;kþ1 /11;iþ1;j;kþ1 h i 1 1 1 þ 14 C i;j;kþ1 /1 1;i;j;kþ1 þ C iþ1;j;kþ1 /1;iþ1;j;kþ1 þ C i;jþ1;kþ1 /1;i;jþ1;kþ1 þ C iþ1;jþ1;kþ1 /1;iþ1;jþ1;kþ1 h i 1 1 1  14 C i;j;k /1 1;i;j;k þ C iþ1;j;k /1;iþ1;j;k þ C i;jþ1;k /1;i;jþ1;k þ C iþ1;jþ1;k /1;iþ1;jþ1;k   þ RR V/00 iþ1=2;jþ1=2;kþ1=2 ¼ ðQV Þiþ1=2;jþ1=2;kþ1=2 1 4

ð8Þ

where

ðV RR /Þiþ1=2;jþ1=2;kþ1=2 ¼18ðV i;j;k RR;i;j;k /i;j;k þV i;j;kþ1 RR;i;j;kþ1 /i;j;kþ1 þV i;jþ1;k RR;i;jþ1;k /i;jþ1;k þV i;jþ1;kþ1 RR;i;jþ1;kþ1 /i;jþ1;kþ1

ð9Þ

þV iþ1;j;k RR;iþ1;j;k /iþ1;j;k þV iþ1;j;kþ1 RR;iþ1;j;kþ1 /iþ1;j;kþ1 þV iþ1;jþ1;k RR;iþ1;jþ1;k /iþ1;jþ1;k þV iþ1;jþ1;kþ1 RR;iþ1;jþ1;kþ1 /iþ1;jþ1;kþ1 Þ

þV i;jþ1;k Q i;jþ1;k þV i;jþ1;kþ1 Q i;jþ1;kþ1 þV iþ1;j;k Q iþ1;j;k

ð10Þ

þV iþ1;j;kþ1 Q iþ1;j;kþ1 þV iþ1;jþ1;k Q iþ1;jþ1;k þV iþ1;jþ1;kþ1 Q iþ1;jþ1;kþ1 Þ 

A/01 iþ1;j;k

ð12Þ

  1 1 2/1 1;i;j;kþ1 ¼ /1;i;j;kþ3=2 þ /1;i;j;kþ1=2

ð13Þ

Eq. (8) is an ‘‘analogy” to a balance equation over the super mesh shown in Fig. 3, where the leakage terms are evaluated at cell centers, and the removal term and source term are the average values over the eight sub-cells. In order to transform Eq. (8) into the source-corrected DSA equation, we have to seek the relation between the first-order moment and the scalar flux. Applying three discrete-to-moment R R R operators ( Y 01 dX, Y 11 dX and Y 1 1 dX) on Eq. (2) yields

  Aiþ1=2;j;k /00;iþ1=2;j;k  Ai1=2;j;k /00;i1=2;j;k   þ 23 Aiþ1=2;j;k /02;iþ1=2;j;k  Ai1=2;j;k /02;i1=2;j;k   pffiffi þ 33 Bi;j;k /12;i;jþ1=2;k  /12;i;j1=2;k   pffiffi 1 0 þ 33 C i;j;k /1 2;i;j;kþ1=2  /2;i;j;k1=2 þ Rt;i;j;k V i;j;k /1;i;j;k ¼ 0

 1 Aiþ1=2;j;k /01;iþ1=2;j;k þ Aiþ3=2;j;k /01;iþ3=2;j;k ¼ 2

ð11Þ

ð14Þ

 pffiffi 3 Aiþ1=2;j;k /12;iþ1=2;j;k Ai1=2;j;k /12;i1=2;j;k 3

  þ13Bi;j;k /00;i;jþ1=2;k /00;i;j1=2;k   pffiffi   13Bi;j;k /02;i;jþ1=2;k /02;i;j1=2;k þ 33Bi;j;k /22;i;jþ1=2;k /22;i;j1=2;k   pffiffi 2 1 þ 33C i;j;k /2 2;i;j;kþ1=2 /2;i;j;k1=2 þRt;i;j;k V i;j;k /1;i;j;k ¼0  pffiffi  1 3 Aiþ1=2;j;k /1 2;iþ1=2;j;k  Ai1=2;j;k /2;i1=2;j;k 3   pffiffi 2 þ 33 Bi;j;k /2 2;i;jþ1=2;k  /2;i;j1=2;k 





1 C /00;i;j;kþ1=2  /00;i;j;k1=2  13 C i;j;k /02;i;j;kþ1=2  /02;i;j;k1=2 3 i;j;k   pffiffi  33 C i;j;k /22;i;j;kþ1=2  /22;i;j;k1=2 þ t;i;j;k V i;j;k /1 1;i;j;k ¼ 0

ð15Þ



ð16Þ

R

It should be noted that not all the high-order angular redistribution integrate to zero. In current study, we have neglected the integration of ‘‘l-” and ‘‘g-” order redistribution. Moreover, the following auxiliary equations are used while deriving Eqs. (14–16)

l2 ¼

2Y 02 þ Y 00 3

ð17Þ

Y0 Y0 g ¼ 0 2þ 3 3

pffiffiffi 2 3Y 2 3

Y 00 Y 02   3 3

pffiffiffi 2 3Y 2 3

2

ðVQ Þiþ1=2;jþ1=2;kþ1=2 ¼18ðV i;j;k Q i;j;k þV i;j;kþ1 Q i;j;kþ1



  2/11;i;jþ1;k ¼ /11;i;jþ3=2;k þ /11;i;jþ1=2;k

1 3

RR ¼ Rt  Rs

Z 

Moreover, the cell average flux moments are approximately equal to the arithmetic average of the ones on the opposite surfaces

n2 ¼

ð18Þ ð19Þ

Therefore, the expressions of the first-order moments can be obtained

Table 1 The normalized spherical harmonics. 1

2

3

4

5

6

7

8

Y 00

Y 01

Y 11

Y 1 1

Y 02 2

Y 12

Y 1 2

Y 22

1

l

g

n

1 2

  3l  1

pffiffiffi 3lg

pffiffiffi 3ln

pffiffi  3 2 2

g  n2

9 

Y 2 pffiffiffi2 3gn

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

4

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

Dright i;jþ1=2;kþ1=2 ¼

Dleft i;jþ1=2;kþ1=2 ¼

 Di;j;k Di;j;kþ1 1 þ Aiþ1=2;j;kþ1 Aiþ1=2;j;k 4 Dr i Dr i  Di;jþ1;k Di;jþ1;kþ1 þ Aiþ1=2;jþ1;kþ1 þ Aiþ1=2;jþ1;k Dr i Dr i  Di;j;k Di;j;kþ1 1 þ Ai1=2;j;kþ1 Ai1=2;j;k 4 Dr i Dr i  Di;jþ1;k Di;jþ1;kþ1 þ Ai1=2;jþ1;kþ1 þ Ai1=2;jþ1;k Dr i Dr i

D0iþ1=2;jþ1;kþ1=2 ¼

D0iþ1=2;j;kþ1=2 ¼ Fig. 3. The super mesh containing eight cells.

/01;i;j;k

 Di;j;k  ¼ Aiþ1=2;j;k /00;iþ1=2;j;k  Ai1=2;j;k /00;i1=2;j;k V i;j;k   þ F /02 ; /12 ; /1 2  Di;j;k  0 /0;i;jþ1=2;k  /00;i;j1=2;k 2pri Dhj   þ G /02 ; /12 ; /22 ; /2 2

ð20Þ

/1 1;i;j;k ¼ 



Di;j;k 0 /0;i;j;kþ1=2  /00;i;j;k1=2 Dzk



D0iþ1=2;jþ1=2;k ¼ ð21Þ 

 2 2 þ H /02 ; /1 2 ; /2 ; /2

ð22Þ

Di;j;k ¼

1

ð23Þ

3Rt;i;j;k

þ

h

1 Diþ1;j;k 4 Dr iþ1

þ

Dright iþ1;jþ1=2;kþ1=2 ¼

Dleft iþ1;jþ1=2;kþ1=2 ¼

ð31Þ

ð32Þ

h

ð33Þ Diþ1;j;kþ1 Dr iþ1

Aiþ1=2;j;kþ1 /00;iþ1=2;kþ1

Aiþ1=2;jþ1;k /00;iþ1=2;jþ1;k þ

Diþ1;j;k Driþ1

Aiþ1=2;j;k þ

Diþ1;jþ1;k Dr iþ1

þ

Diþ1;j;kþ1 Dr iþ1

Aiþ1=2;jþ1;k þ  2   þo h þ o z2

0 þDright i;jþ1=2;kþ1=2 /0;iþ1=2;jþ1=2;kþ1=2

with

 Di;j;k Diþ1;j;k 1 C i;j;k þ C iþ1;j;k 4 Dz k Dz k  Di;jþ1;k Diþ1;jþ1;k þ C iþ1;jþ1;k þ C i;jþ1;k Dz k Dzk

Aiþ1=2;j;k /00;iþ1=2;j;k þ

Diþ1;jþ1;k Dr iþ1

¼ 14

0 Dleft iþ1;jþ1=2;kþ1=2 /0;iþ1=2;jþ1=2;kþ1=2

0  Dleft i;jþ1=2;kþ1=2 /0;i1=2;jþ1=2;kþ1=2   D0 iþ1=2;jþ1;kþ1=2 /00;iþ1=2;jþ3=2;kþ1=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 iþ1=2;j;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2  /00;iþ1=2;j1=2;kþ1=2   D0 iþ1=2;jþ1=2;kþ1 /00;iþ1=2;jþ1=2;kþ3=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 iþ1=2;jþ1=2;k /00;iþ1=2;jþ1=2;kþ1=2  /00;iþ1=2;jþ1=2;k1=2   þ RR V/00 iþ1=2;jþ1=2;kþ1=2 ¼ ðQV Þiþ1=2;jþ1=2;kþ1=2  R

 Di;j;kþ1 Diþ1;j;kþ1 1 C i;j;kþ1 þ C iþ1;j;kþ1 4 Dzkþ1 Dzkþ1  Di;jþ1;kþ1 Diþ1;jþ1;kþ1 þ C iþ1;jþ1;kþ1 þ C i;jþ1;kþ1 Dzkþ1 Dzkþ1

R is the leakage correction term. In deriving Eq.(24) from Eqs. (8), (20), (21) and (22) we have used the fact that

F, G and H are functions of the second-order flux moments that will go to zero in the diffusion theory limit. The following source-corrected DSA equation is obtained by substituting the Eqs. (20–22) into the Eq. (8) 0 Dright iþ1;jþ1=2;kþ1=2 /0;iþ3=2;jþ1=2;kþ1=2

 Di;jþ1;k Diþ1;jþ1;k 1 Bi;jþ1;k þ Biþ1;jþ1;k 4 2pri Dhjþ1 2pr iþ1 Dhjþ1  Di;jþ1;kþ1 Diþ1;jþ1;kþ1 ð29Þ þ Bi;jþ1;kþ1 þ Biþ1;jþ1;kþ1 2pr i Dhjþ1 2priþ1 Dhjþ1

 0   A/1 i ¼ Ai /01;i þ o r2

where D is the diffusion coefficient defined by

ð28Þ

 Di;j;k Diþ1;j;k Di;j;kþ1 1 þ Biþ1;j;k þ Bi;j;kþ1 Bi;j;k 4 2 p r i Dh j 2pr iþ1 Dhj 2pr i Dhj  Diþ1;j;kþ1 þ Biþ1;j;kþ1 ð30Þ 2priþ1 Dhj

D0iþ1=2;jþ1=2;kþ1 ¼

/11;i;j;k ¼ 

ð27Þ

Diþ1;jþ1;kþ1 Dr iþ1

Aiþ1=2;jþ1;kþ1 /00;iþ1=2;jþ1;kþ1

i

Aiþ1=2;j;kþ1

Diþ1;jþ1;kþ1 Dr iþ1

i Aiþ1=2;jþ1;kþ1  /00;iþ1=2;jþ1=2;kþ1=2 ð34Þ

ð24Þ

 Diþ1;j;k Diþ1;j;kþ1 1 Aiþ3=2;j;k þ Aiþ3=2;j;kþ1 4 Driþ1 Dr iþ1  Diþ1;jþ1;k Diþ1;jþ1;kþ1 ð25Þ þ Aiþ3=2;jþ1;kþ1 þ Aiþ3=2;jþ1;k Driþ1 Dr iþ1  Diþ1;j;k Diþ1;j;kþ1 1 þ Aiþ1=2;j;kþ1 Aiþ1=2;j;k 4 Driþ1 Dr iþ1  Diþ1;jþ1;k Diþ1;jþ1;kþ1 ð26Þ þ Aiþ1=2;jþ1;kþ1 þ Aiþ1=2;jþ1;k Driþ1 Dr iþ1

1 4

h

D

D

Bi;jþ1;k 2pri;jþ1;k /00;i;jþ1=2;k þ Biþ1;jþ1;k 2priþ1;jþ1;k /00;iþ1;jþ1=2;k i Dhjþ1 iþ1 Dhjþ1 D

D

/0 þ Biþ1;jþ1;kþ1 2piþ1;jþ1;kþ1 /0 þBi;jþ1;kþ1 2pi;jþ1;kþ1 ri Dhjþ1 0;i;jþ1=2;kþ1 r iþ1 Dhjþ1 0;iþ1;jþ1=2;kþ1 h D D þ Biþ1;jþ1;k 2priþ1;jþ1;k ¼ 14 Bi;jþ1;k 2pri;jþ1;k i Dhjþ1 iþ1 Dhjþ1 i Di;jþ1;kþ1 Diþ1;jþ1;kþ1 þBi;jþ1;kþ1 2pri Dhjþ1 þ Biþ1;jþ1;kþ1 2priþ1 Dhjþ1  /00;iþ1=2;jþ1=2;kþ1=2     þo r 2 þ o z2

i

ð35Þ 1 4

h

C i;j;kþ1

Di;j;kþ1 Dzkþ1

/00;i;j;kþ1=2 þ C iþ1;j;kþ1

D

Diþ1;j;kþ1 Dzkþ1

/00;iþ1;j;kþ1=2 D

/00;i;jþ1;kþ1=2 þ C iþ1;jþ1;kþ1 iþ1;jþ1;kþ1 /00;iþ1;jþ1;kþ1=2 þC i;jþ1;kþ1 i;jþ1;kþ1 Dzkþ1 Dzkþ1 h D D D þC iþ1;j;kþ1 iþ1;j;kþ1 þ C i;jþ1;kþ1 i;jþ1;kþ1 ¼ 14 C i;j;kþ1 Di;j;kþ1 zkþ1 Dzkþ1 Dzkþ1 i  2   Diþ1;jþ1;kþ1 0 þC iþ1;jþ1;kþ1 Dz /0;iþ1=2;jþ1=2;kþ1=2 þ o r þ o h2

i

kþ1

ð36Þ

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

5

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

It can be found that the coefficient matrix of Eq. (24) is not symmetric because of Dright –Dleft i i . Therefore, the CG method cannot be used to solve Eq. (24) directly. In order to make the coefficient matrix of Eq. (24) be a SPD (symmetric positive definite) matrix, we have done the following transformation left 0 0 Dright i;jþ1=2;kþ1=2 /0;iþ1=2;jþ1=2;kþ1=2  Di;jþ1=2;kþ1=2 /0;i1=2;jþ1=2;kþ1=2     ¼ D0 i;jþ1=2;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2  /00;i1=2;jþ1=2;kþ1=2 þ o r2

ð37Þ

and

  left D0 i;jþ1=2;kþ1=2 ¼ 12 Dright i;jþ1=2;kþ1=2 þ Di;jþ1=2;kþ1=2 h i D D D D Ai;j;k þ i;j;kþ1 Ai;j;kþ1 þ i;jþ1;k Ai;jþ1;k þ i;jþ1;kþ1 Ai;jþ1;kþ1 ¼ 14 Di;j;k r Dr Dr Dr i

i

i

Therefore, the expression of Rr is

      Rr ¼ o r 2 1 þ o r 2 2 þ o r 2 3 h i        ¼ 14 A/01 iþ1;j;k þ A/01 iþ1;jþ1;k þ A/01 iþ1;j;kþ1 þ A/01 iþ1;jþ1;kþ1 h i         14 A/01 i;j;k þ A/01 i;j;kþ1 þ A/01 i;jþ1;k þ A/01 i;jþ1;kþ1   þD0 iþ1;jþ1=2;kþ1=2 /00;iþ3=2;jþ1=2;kþ1=2  /00;iþ1=2;jþ1=2;kþ1=2   D0 i;jþ1=2;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2  /00;i1=2;jþ1=2;kþ1=2 ð44Þ The derivations of Rh and Rz are similar to Rr, and the finial expression of R can be written as follows

i

ð38Þ Finally, we solve the following source-corrected DSA equation

  D0 iþ1;jþ1=2;kþ1=2 /00;iþ3=2;jþ1=2;kþ1=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 i;jþ1=2;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2  /00;i1=2;jþ1=2;kþ1=2   D0 iþ1=2;jþ1;kþ1=2 /00;iþ1=2;jþ3=2;kþ1=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 iþ1=2;j;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2  /00;iþ1=2;j1=2;kþ1=2

ð39Þ

  D0 iþ1=2;jþ1=2;kþ1 /00;iþ1=2;jþ1=2;kþ3=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 iþ1=2;jþ1=2;k /00;iþ1=2;jþ1=2;kþ1=2  /00;iþ1=2;jþ1=2;k1=2

with

  Ldiffusion ¼ D0 iþ1;jþ1=2;kþ1=2 /00;iþ3=2;jþ1=2;kþ1=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 i;jþ1=2;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2  /00;i1=2;jþ1=2;kþ1=2   D0 iþ1=2;jþ1;kþ1=2 /00;iþ1=2;jþ3=2;kþ1=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 iþ1=2;j;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2  /00;iþ1=2;j1=2;kþ1=2   D0 iþ1=2;jþ1=2;kþ1 /00;iþ1=2;jþ1=2;kþ3=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 iþ1=2;jþ1=2;k /00;iþ1=2;jþ1=2;kþ1=2  /00;iþ1=2;jþ1=2;k1=2

and

Now we should derive the expression of R. It’s obvious that the leakage correction term is made up of three components along the r-, h- and z-axes, namely

ð40Þ

We have used the high-order small quantity along the r-axis three times (Eq. (33), (34) and (37)) while deriving the final source-corrected DSA equation. They can be expressed respectively as follows h i          o r 2 1 ¼ 14 A/01 iþ1;j;k þ A/01 iþ1;jþ1;k þ A/01 iþ1;j;kþ1 þ A/01 iþ1;jþ1;kþ1 h i         14 A/01 i;j;k þ A/01 i;j;kþ1 þ A/01 i;jþ1;k þ A/01 i;jþ1;kþ1 h i  14 Aiþ1;j;k /01;iþ1;j;k þ Aiþ1;jþ1;k /01;iþ1;jþ1;k þ Aiþ1;j;kþ1 /01;iþ1;j;kþ1 þ Aiþ1;jþ1;kþ1 /01;iþ1;jþ1;kþ1 h i þ 14 Ai;j;k /01;i;j;k þ Ai;jþ1;k /01;i;jþ1;k þ Ai;j;kþ1 /01;i;j;kþ1 þ Ai;jþ1;kþ1 /01;i;jþ1;kþ1 ð41Þ

h   o r 2 2 ¼ 14 Aiþ1;j;k /01;iþ1;j;k þ Aiþ1;jþ1;k /01;iþ1;jþ1;k þ Aiþ1;j;kþ1 /01;iþ1;j;kþ1 i þAiþ1;jþ1;kþ1 /01;iþ1;jþ1;kþ1 h i  14 Ai;j;k /01;i;j;k þ Ai;jþ1;k /01;i;jþ1;k þ Ai;j;kþ1 /01;i;j;kþ1 þ Ai;jþ1;kþ1 /01;i;jþ1;kþ1

ð45Þ

ð46Þ

  þ RR V/00 iþ1=2;jþ1=2;kþ1=2 ¼ ðQV Þiþ1=2;jþ1=2;kþ1=2  R

R ¼ Rr þ Rh þ Rz

R ¼ Ltransport  Ldiffusion

ð42Þ

left 0 0 þDright iþ1;jþ1=2;kþ1=2 /0;iþ3=2;jþ1=2;kþ1=2  Diþ1;jþ1=2;kþ1=2 /0;iþ1=2;jþ1=2;kþ1=2 left 0 0 Dright i;jþ1=2;kþ1=2 /0;iþ1=2;jþ1=2;kþ1=2 þ Di;jþ1=2;kþ1=2 /0;i1=2;jþ1=2;kþ1=2

  o r 2 3 ¼Dright /0 þDleft /0 iþ1;jþ1=2;kþ1=2 0;iþ3=2;jþ1=2;kþ1=2 iþ1;jþ1=2;kþ1=2 0;iþ1=2;jþ1=2;kþ1=2 left 0 0 þDright i;jþ1=2;kþ1=2 /0;iþ1=2;jþ1=2;kþ1=2 Di;jþ1=2;kþ1=2 /0;i1=2;jþ1=2;kþ1=2   þD0 iþ1;jþ1=2;kþ1=2 /00;iþ3=2;jþ1=2;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2   D0 i;jþ1=2;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2 /00;i1=2;jþ1=2;kþ1=2

ð43Þ

h i        Ltransport ¼ 14 A/01 iþ1;j;k þ A/01 iþ1;j;kþ1 þ A/01 iþ1;jþ1;k þ A/01 iþ1;jþ1;kþ1 h i         14 A/01 i;j;k þ A/01 i;j;kþ1 þ A/01 i;jþ1;k þ A/01 i;jþ1;kþ1 h i þ 14 Bi;jþ1;k /11;i;jþ1;k þ Biþ1;jþ1;k /11;iþ1;jþ1;k þ Bi;jþ1;kþ1 /11;i;jþ1;kþ1 þ Biþ1;jþ1;kþ1 /11;iþ1;jþ1;kþ1 h i  14 Bi;j;k /11;i;j;k þ Biþ1;j;k /11;iþ1;j;k þ Bi;j;kþ1 /11;i;j;kþ1 þ Biþ1;j;kþ1 /11;iþ1;j;kþ1 h i 1 1 1 þ 14 C i;j;kþ1 /1 1;i;j;kþ1 þ C iþ1;j;kþ1 /1;iþ1;j;kþ1 þ C i;jþ1;kþ1 /1;i;jþ1;kþ1 þ C iþ1;jþ1;kþ1 /1;iþ1;jþ1;kþ1 h i 1 1 1  14 C i;j;k /1 1;i;j;k þ C iþ1;j;k /1;iþ1;j;k þ C i;jþ1;k /1;i;jþ1;k þ C iþ1;jþ1;k /1;iþ1;jþ1;k ð47Þ

It can be seen that the correction term is the difference between the transport leakage on the mesh centers and the diffusion leakage on the mesh vertexes. That is to say, we solve the transport equation on the mesh centers but the diffusion equation on mesh vertexes. Currently, only the vacuum and reflective boundary conditions have been implemented in this study. 3. The KBA parallel algorithm The KBA parallel algorithm devised by Koch, Baker and Alcouffe is a widely accepted parallel sweep algorithm. The KBA algorithm partitions a body by assigning a column of cells to each processor. Each column of cells is decomposed into several computational blocks as shown in Fig. 4. The KBA algorithm is essentially the (d-1) dimensional decomposition for d-dimensional meshes. Therefore, the processors for 3-D parallel calculation are distributed in a 2-D map. In this paper, spatial domain decomposition is performed on the r-h plane. For a given direction, the sweep order through blocks is shown in Fig. 5. Firstly, only one block is calculated. As long as the calculation in the first block is completed, its outgoing angular fluxes are broadcasted into the adjacent processors as their incoming angular fluxes. The right and back processors are started as soon as they receive the incoming angular fluxes from previous processors. Then, another three processors are started in turn. When the pro-

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

6

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

Ax ¼ b

ð50Þ

where x is an unknown vector, b is a known vector, and A is a known SPD matrix. In this paper, the CG method is applied to solve the diffusion equation. The standard CG procedure is given in Table 2. The iterative number of the standard CG algorithm depends on the initial value x0 and the eigenvalue distribution of A. Moreover, we can find that the matrix-vector multiplication is the most time-consuming part of the standard CG procedure. Preconditioning is a technique for improving the eigenvalue distribution of a matrix. Suppose that M is a matrix that approximates A, but is easier to invert. Therefore, we can solve Ax = b indirectly by solving

M1 Ax ¼ M 1 b Fig. 4. Spatial domain decomposition of the KBA algorithm with 9 processors (Xu et al., 2018).

cessor finishes its task in current direction, it starts the next task in another direction belongs to the same octant. Octants in the same quadrant such as (+,+,) and (+,+,+) are pipelined to achieve excellent scaling efficiency. In the X-Y-Z geometry, the theoretical PCE (parallel computational efficiency) given by (Xu et al., 2018) is

8 8MNk > > < PCExyz ¼ 8MNk þ3Pmax þ2Pmin 5   Pmax ¼ max Px ; Py > >   : Pmin ¼ min Px ; Py

ð48Þ

where M is the number of discrete angles in one octant, and Nk is the number of blocks along the z-axis. Px and Py are the number of processors along the x- and y- axes. The clock octant sweep is applied in the R-h-Z geometry (Xu et al., 2017), and the theoretical PCE is

PCErhz ¼

MNk

ð49Þ

k

MN þ Pr þ P h  2

where Pr and Ph are the number of processors along the r- and haxes. 4. The red-black sweep preconditioned CG method 4.1. Background of CG method

If the eigenvalues of M A are better clustered than those of A, we can iteratively solve Eq. (51) more quickly than the original problem. Unfortunately, M1A is not generally a SPD matrix, even if M and A are. This difficulty is circumvented by solving the following equation (Shewchuk, 1994)

E1 AET ^x ¼ E1 b;

^x ¼ ET x

ð52Þ

where E has the property that EET = M. The matrices E1AET and M1A have the same eigenvalues. This is true because if v is an eigenvector of M1A with eigenvalue k, then ET v will be an eigenvector of E1AET with eigenvalue k

     E1 AET ET v ¼ ET ET E1 Av ¼ ET M1 Av ¼ kET v

ð53Þ

Based on Eq. (52), the PCG (Preconditioned Conjugate Gradient) procedure is given in Table 3. It can be seen that the only difference between algorithm 1 and algorithm 2 is that an extra equation My = r is needed to be solved. If we set M = I in algorithm 2, we will recover the standard CG procedure. A perfect preconditioner is M = A such that M1A has a condition number of one.

Table 2 Algorithm 1: the standard CG procedure. Set r 0 ¼ b  Ax0 ; p0 ¼ r 0 ; k ¼ 0 while r k –0 rT r

ak ¼ pTkApkk k xkþ1 ¼ xk þ ak pk r kþ1 ¼ r k  ak Apk bkþ1 ¼

CG method is effective for solving large systems of linear equations with the form

ð51Þ 1

r Tkþ1 rkþ1 r Tk rk

pkþ1 ¼ r kþ1 þ bkþ1 pk k¼kþ1 endðwhileÞ

Fig. 5. Block sweep order of a given direction (Zhang, 2014).

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

7

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

red-inversion and a black-inversion. In the red-inversion, all the fluxes on the black lines are set as known variables in the righthand side of Eq. (54) so that all red lines are independent and can be solved with the forward elimination – backwards substitution method simultaneously. For example, the fluxes on the vertexes 16 ~ 21 (Fig. 7) will be set as the right-hand sides, when the fluxes on the vertexes 1 ~ 3 are being solved. The blackinversion is similar to the red-inversion except that the righthand side is refreshed with the new fluxes on the red lines. We define this red-black sweep iteration as RB(m) preconditioner, where m is the times of red-black sweep. It’s obvious that

Table 3 Algorithm 2: the PCG procedure. Set r 0 ¼ b  Ax0 Solve My0 ¼ r 0 for y0 p0 ¼ y0 ; k ¼ 0 while r k –0 rT y

ak ¼ pTkApkk k xkþ1 ¼ xk þ ak pk r kþ1 ¼ r k  ak Apk

Solve Mykþ1 ¼ r kþ1 for ykþ1 bkþ1 ¼

rTkþ1 ykþ1 rTk yk

pkþ1 ¼ ykþ1 þ bkþ1 pk k¼kþ1 endðwhileÞ

lim RBðmÞ ¼ lim M1 ðmÞ ¼ A1

m!1

Unfortunately, solving the equation My = r makes this preconditioner worthless. Therefore, our task is to find a preconditioner that approximates A, but is easier to invert. 4.2. Red-black sweep preconditioner It should be noted that it’s not necessary to compute M or M1 explicitly but only necessary to be able to compute the effect of applying M1 to a vector. In order to form such a preconditioner, we move the r- and h-axial components of the diffusion operator of Eq.(39) into the right-hand side as follows



D0 iþ1;jþ1=2;kþ1=2 þ D0 i;jþ1=2;kþ1=2 þ D0 iþ1=2;jþ1;kþ1=2 i þD0 iþ1=2;j;kþ1=2 þ ðRR V Þiþ1=2;jþ1=2;kþ1=2 /00;iþ1=2;jþ1=2;kþ1=2   D0 iþ1=2;jþ1=2;kþ1 /00;iþ1=2;jþ1=2;kþ3=2  /00;iþ1=2;jþ1=2;kþ1=2   þD0 iþ1=2;jþ1=2;k /00;iþ1=2;jþ1=2;kþ1=2  /00;iþ1=2;jþ1=2;k1=2

D

þD

0

0 iþ1=2;j;kþ1=2 /0;iþ1=2;j1=2;kþ1=2

Dred L

U Dblack



xred xblack





¼

bred



bblack

ð56Þ

where Dred and Dblack are the tridiagonal matrices. In order to analyze the red-black sweep iteration, Eq.(56) is transformed equivalently into the following equation sets



þD0 i;jþ1=2;kþ1=2 /00;i1=2;jþ1=2;kþ1=2 0 iþ1=2;jþ1;kþ1=2 /0;iþ1=2;jþ3=2;kþ1=2

ð55Þ

Therefore, a compromise between the effectiveness and the efficiency is available by adjusting the times of red-black sweep iteration. As discussed in Section 3, the spatial domain decomposition of the KBA algorithm is performed on the r-h plane, hence the RB(m) is scalable with number of processors since all of the vertexes on the same z-axis belong to one processor. Therefore, the parallel PCG method matches well with the KBA algorithm. In this study, the RB(m) preconditioned CG method is named after RB(m)-PCG. As shown in Fig. 7, the vertexes are labeled by listing the red vertexes first together, followed by the black ones. With this redblack ordering, the Eq. (39) will have the structure shown in Fig. 8. Therefore, the linear system of diffusion equation can be written as follows



¼ ðQV Þiþ1=2;jþ1=2;kþ1=2 þ D0 iþ1;jþ1=2;kþ1=2 /00;iþ3=2;jþ1=2;kþ1=2 0

m!1

Lxred þ Dblack xblack ¼ bblack

R ð54Þ

If all the variables in the right-hand side are regarded as known ones, the coefficient matrix of Eq. (54) will be a tridiagonal matrix. Then the left-hand side can be easily inverted with the forward elimination – backwards substitution method. Now the vertexes on the r-h plane are colored with red-black arrangement (as shown in Fig. 6) such that the left-hand and right-hand sides in Eq. (54) can be refreshed with each other. As shown in Fig. 7, the vertexes on the z-axis having the same color are defined as red-line and black line, respectively. One full red-black sweep consists of a

Dred xred þ Uxblack ¼ bred

(

ð57Þ

So one full red-black sweep iteration can be expressed as

  1 k xkþ1 red ¼ Dred bred  Uxblack   1 kþ1 xkþ1 black ¼ Dblack bblack  Lxred

ð58Þ

1 where D1 red and Dblack are the red-inversion and the black-inversion, respectively. As mentioned in Section 1, the initial vector x0 does affect the number of CG iterations required to converge to a given tolerance even through it has no effect on the inevitable outcome. In general, the transport fluxes are chosen as the initial iterative values of the CG method. However, we have found something interesting by doing several red-black sweep iterations before starting the CG kþ1 kþ1 or PCG iteration. That is to say, the [xred ,xblack ] is used as the initial vector instead of the fluxes form the transport solver. Of course, the transport fluxes are set as the initial values of the red-black sweep iteration indicated by Eq. (58). The residual vector in the CG or PCG method is defined as

r ¼ b  Ax

ð59Þ

kþ1 With [xkþ1 red ,xblack ] as the initial vector, the residual vector can be computed

    " kþ1 # xred Dred r red U bred ¼ b  Axkþ1 ¼  rblack L Dblack xkþ1 bblack black # "  #   " kþ1 Dred xred þ Uxkþ1 bred U xkblack  xkþ1 black black ¼  ¼ kþ1 bblack 0 Lxkþ1 red þ Dblack xblack 

Fig. 6. The red-black arrangement on the r-h plane. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

ð60Þ

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

8

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

Fig. 7. The red-black coloring of the vertexes on a 2  2  2 meshes. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

and

 Ay ¼

"

Dred

U

L

Dblack

 # m1 D1 red r red  Uyblack m D1 black Lyred

 ¼

  m r red  U ym1 black  yblack 0 ð63Þ

It can be found that the black part of Apk is always equal to zero, namely

        Apkþ1 red ¼ Aykþ1 red þ bkþ1 Apkþ1 red ; Apkþ1 black ¼ 0

ð64Þ

Based on the analysis above, a new algorithm (shown in Table 4) has been proposed in this study. Compared with the algorithm 2, the algorithm 3 not only provides a better initial iterative vector but also reduces the operation (additions and multiplications) almost by half. Only the red part of the ‘‘matrix-vector multiplication” needs to be calculated. Therefore, the algorithm 3 is called half-reduced RB(m)-PCG method, which is marked as HRB(m)PCG in this study. Fig. 8. The structure of a 3  3  3 vertexes with the red-black ordering. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Table 4 Algorithm 3: the half-reduced RB(m)-PCG procedure.

It’s amazing that the rblack is always equal to zero no matter how many times the red-black sweep iteration is performed. Now we return to the PCG procedure (Algorithm 2) where an extra preconditioned equation My = r is needed to be solved in each iteration. Now that rblack is always equal to zero, y can be inverted with m-times red-black sweep as follows



ym red ym black

"

 ¼

 # m1 D1 red r red  Uyblack m D1 black Lyred

rT

ð62Þ

yred

0 k ¼ 0; a0 ¼ pT 0;red ðAp Þ red 0;red 0 red while r k 2 > tolerance xkþ1 ¼ xk þ ak pk red r red kþ1 ¼ r k  ak ðApk Þred Solve Mykþ1 ¼ r kþ1 for ykþ1

ð61Þ

Apk operation is the most time-consuming part in the standard CG procedure. With the character of the RB(m) preconditioner, Apk can be computed by the following procedure

Apkþ1 ¼ Aykþ1 þ bkþ1 Apk ; Ap0 ¼ Ay0

Solve Mx0 ¼ b with xtrans as the initial black vector ðm  1Þ  xblack ðmÞ ; r black ¼0 Set r red 0 ¼ U x0 0 0 Solve My0 ¼ r 0 for y0 Set p0 ¼ y 0 ; ðAp0 Þblack ¼ 0 black ðAp0 Þred ¼ r red ðm  1Þ  yblack ðmÞ 0  U y0 0

bkþ1 ¼ 

r Tkþ1;red ykþ1;red rTk;red yk;red

 pkþ1 ¼ ykþ1þ bkþ1 pk Apkþ1 red ¼ Aykþ1 red þ bkþ1 ðApk Þred

akþ1 ¼ pT

rTkþ1;red yred kþ1

ðApkþ1 Þred k¼kþ1 endðwhileÞ kþ1;red

* xtrans is the flux form the transport solver.

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

9

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

5. Numerical analysis In this Section, both homogeneous and heterogeneous benchmarks have been calculated to analyze the properties of the parallel DSA method. These benchmarks cover PWR Pin-by-pin calculation, PWR shielding calculation and critical calculation of FBR model. All the speed-up factors in this study are defined by reductions in solution time. 5.1. Homogeneous problem A single-group homogeneous problem with three kinds of scattering ratios (0.9, 0.95 and 0.99) has been calculated in the R-h-Z geometries. In this study, the scattering ratio (c) is defined as the ratio of the scattering cross-section to total cross-section. The configuration and boundary conditions are illustrated in Fig. 9. The bottom and h-direction surfaces are set as the reflective boundaries, and other surfaces are set as the vacuum boundaries. The spatial discretization is performed with the 100  100  100 meshes as well as S8 quadrature for the angular discretization. The convergence criterion of the point flux is set to 1.0e5. We have compared the SI scheme (source iteration with no acceleration) and the DSA algorithm with HRB(m)-PCG solution, where m ranges from 1 ~ 6. The computational results such as the inner iterative number (n), solution time (t) and speed-up fac-

Fig. 9. The configuration of homogeneous problem.

tor (c) are summarized in Table 5. The speed-up factor is improved with the increase of the scattering ratio. This is because the medium becomes more and more diffusive as the scattering ratio increases. For the cases (c = 0.9 and 0.95), n decreases as m increase, which is consistent with our expectations. Nevertheless, when c is equal to 0.99, HRB(3)-PCG achieves the best performance. When m increases from 2 to 3, n drops rapidly from 35 to 26. But for other two cases, n drops slowly as m increases. This case may be an exception, and it needs further investigation. The speed-up factor depends on both the number of iteration and the solution time per iteration. For the three cases, the best speed-up factors are achieved when m is equal to 4, 5 and 3, respectively. For realistic problems, m is usually set between 1 and 4. 5.2. Eigenvalue problem The PIN (power iterative number), solution time, speed-up factor and effective multiplication factor are analyzed in this study. The HRB(1)-PCG solver is applied in the DSA algorithm both for the PWR and FBR problems. 5.2.1. PWR Pin-by-pin problem To investigate the effectiveness of the parallel DSA, a PWR Pinby-pin problem provided by (Yang et al., 2018) was calculated. The configuration of this benchmark is shown in Fig. 10, which consists

Fig. 10. The PWR problem consisting of 9 types of assemblies as indicated by color.

Table 5 Numerical results of the homogeneous problem. c

SI DSA

0.9

0.95

0.99

m

n

t(secs)

c

m

n

t(secs)

c

m

n

t(secs)

c

– 1 2 3 4 5 6

82 23 23 21 20 20 20

379.38 131.94 133.88 125.97 123.53 126.73 128.19

– 2.88 2.83 3.01 3.07 2.99 2.96

– 1 2 3 4 5 6

131 30 26 25 24 22 22

603.27 172.02 153.24 150.33 148.61 140.93 143.55

– 3.51 3.93 4.01 4.06 4.28 4.20

– 1 2 3 4 5 6

260 36 35 26 28 31 29

1220.40 214.16 204.88 165.87 177.90 200.16 189.08

– 5.70 5.96 7.36 6.86 6.10 6.45

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

10

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

Table 6 The parallel calculation parameters of the PWR problem. Case

Total processors = Px  Py

Px

Py

Nk

M

1 2 3 4 5 6 7

9 16 25 36 64 91 100

3 4 5 6 8 9 10

3 4 5 6 8 9 10

5 5 5 5 5 5 5

3 3 3 3 3 3 3

of 1.8 wt%, 2.4 wt% and 3.1 wt% UO2 fuel assemblies with variety types of burnable position rod cluster and control rod cluster. The detailed description can be found in the reference. This calculation was performed in a three-dimensional 1/8 core with 289  289  25 meshes. The S4 quadrature is used for the angular discretization. The convergence criteria of the eigenvalue, point flux and fission density are set to 1.0e7, 1.0e6 and 1.0e5, respectively. This problem was run on the Pluto cluster at IAPCM from 9 to 100 processors. The number of processors along each axis and other parallel parameters are given in Table 6. It’s too slow to compute this problem on single processor, so both the theoretical and measured PCEs of SI scheme are normalized to 9 processors. The parallel solution time of the SI and DSA schemes are illustrated in Fig. 11(a). As the number of processors increases, the solution time of both schemes drop rapidly. According to Eq. (48), the theoretical PCE of SI scheme (excluding the communication time) can be calculated. The PCE and parallel speed-up factor are shown in Fig. 11(b). As the processor increases, the speed-up factor (obtained by DSA only) fluctuates around 5, which indicates that the HRB(m)-PCG algorithm matches well with the KBA parallel algorithm. The effective multiplication factor results of EFEN and Violet codes come from (Yang et al., 2018). EFEN is nodal-SP3 code developed at XJTU (Li, 2013), and Violet is a Pn transport code based on variational nodal method (Li et al., 2017). The numerical results are

summarized in Table 7. Compared with the solutions of Violet (P3), the keff deviations of the EFEN, SI and DSA schemes are 125, 12 and 14 pcm, respectively. The S4 calculation has the similar accuracy with the P3 calculation. 5.2.2. FBR problem A 3-D cylindrical FBR model is calculated to investigate the effectiveness of the parallel DSA in the R-h-Z geometry. The configuration is illustrated in Fig. 12, which consists of 5 parts: the core, radial breeding zone, radial reflector, axial breeding zone and axial reflector. The transport calculation is performed with 120  8  160 meshes and S8 quadrature. The 4-group macroscopic crosssections come from Takeda’s paper (Takeda and Ikeda, 1991). The convergence criteria of the eigenvalue, point flux and fission density are set to 1.0e7, 1.0e6 and 1.0e5, respectively. This problem was run on the Pluto cluster at IAPCM from 1 to 24 processors. The number of processors along each axis and other parallel parameters are given in Table 8. The parallel solution time of the SI and DSA schemes are illustrated in Fig. 13(a). The theoretical PCE is calculated according to Eq.(49). The PCEs (theoretical & measured) and parallel speed-up factors achieved by DSA are shown in Fig. 13(b). As the processors increase, the speed-up factor almost remains unchanged. The solution time is reduced by over 10 times for all cases. The keff and PIN results are given in Table 9. The keff deviation between the SI and DSA scheme is 1 pcm. The PIN is reduced by 13 times. Therefore, the parallel DSA based on HRB (m)-PCG algorithm is an effective method to accelerate the KBA algorithm in the 3-D cylindrical geometry. 5.3. PWR shielding calculation In this study, a 966 MW(thermal) PWR shielding model consisting of reactor core, core baffle, core bypass region, core barrel, down-comer water, pressure vessel, reactor cavity, biological shield etc. is established in R-h-Z geometry to investigate the properties of the HRB(m)-PCG DSA algorithm for shielding calculation.

Fig. 11. The parallel numerical results of the PWR problem.

Table 7 Numerical results of the PWR Pin-by-pin problem.

keff Deviation (pcm) PIN

Violet

EFEN

SI

DSA

1.01233 – –

1.01108 125 –

1.01245 12 4498

1.01247 14 652

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

11

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

Fig. 12. Configuration of the FBR model.

Table 8 The parallel calculation parameters of the FBR problem. Case

Total processors = Pr  Ph

Pr

Ph

Nk

M

1 2 3 4 5

1 4 8 16 24

1 2 4 8 12

1 2 2 2 2

1 4 4 4 4

10 10 10 10 10

Fig. 13. The parallel numerical results of the FBR problem.

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

12

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

Table 9 The keff and PIN results of the FBR problem.

keff PIN

Table 10 Computational results of the PWR shielding model.

SI

DSA

Scheme

m

n

t(secs)

c

0.91514 494

0.91515 37

SI DSA

– 1 2 3 4

1743 527 491 473 457

3046.82 1054.80 1001.70 974.88 947.35

– 2.89 3.04 3.13 3.22

The front view and vertical view are illustrated in Fig. 14. The spatial discretization is performed with 301  181  125 meshes. The quadrature order is S8. The 30-group MATXS cross-section with P3 Legendre expansion is used. Based on the conclusions in Section 5.1, we have known that the HRB(m)-PCG will usually achieve its best performance when m 2 [1, 4]. Therefore, the DSA algorithm with HRB(1 ~ 4)-PCG solver is investigated in the PWR shielding calculation. 42 processors are used in this parallel calculation. The tolerance of the point flux is set to 1.0e-3. The computational results are summarized in Table 10. In this realistic problem, the HRB(m)-PCG is not very sensitive to the times of red-black sweep. The HRB(4)-PCG solver achieves the best speed-up factor. The solution time is reduced by 3.22 times. Based on HRB(4)-PCG solver, the solution time of single transport sweep and one diffusion solution is analyzed in Table 11. Ttrans_DSA and Ttrans_SI are defined as the solution time of single transport sweep with and without DSA acceleration, respectively. Tdiff indicates the time required to solve the diffusion equation within a given tolerance. Therefore, Ttrans_DSA + Tdiff is the solution time of one inner iteration when the DSA algorithm is used. The solution time of each part is measured by MPI_Wtime function. We found that the Ttrans_DSA was a little longer than the Ttrans_SI. This is because that the transport leakage should be formed in each transport sweep when the DSA algorithm is applied. The ratio of Ttrans_SI to Tdiff shows that the diffusion solver is at least 9.26 times faster than once S8 transport sweep, which indicates the HRB(m)PCG is quite effective. The value of (Ttrans_DSA + Tdiff)/Ttrans_SI is 1.19. Therefore, the DSA algorithm will be effective and efficient as long as the iteration number is reduced by a factor greater than 1.19. The iteration number reduced factor of each energy group is given in Fig. 15.

Table 11 The solution time of single transport sweep or diffusion solution. Solution time (secs)

ratio

Ttrans_SI Ttrans_DSA Tdiff Ttrans_DSA + Tdiff Ttrans_SI/Tdiff (Ttrans_DSA + Tdiff)/Ttrans_SI

1.74729 1.88208 0.18868 2.07076 9.26 1.19

Fig. 15. The iteration number reduced factor of each energy group.

Fig. 14. The configuration of the PWR shielding model.

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

The DSA algorithm is effective but inefficient for the first 6 groups. Strong anisotropic scattering of the fast neutron disables the effectiveness of the DSA algorithm. These results indicate that skipping some high energy group may improve the final speed-up factor of the DSA algorithm, which is another topic beyond this paper. 6. Conclusions In this study, a parallel DSA algorithm is implemented in the Rh-Z geometry. With the red-black vertexes ordering, the parallel DSA algorithm matches well with the KBA method. The RB(m) iteration is used as a preconditioner to improve the performance of CG method. A good compromise is available by adjusting the times of red-black iteration. The HRB(m)-PCG algorithm is proposed by using the scalar flux refreshed through several red-black iterations as the initial values. This HRB(m)-PCG algorithm simplifies the matrix-vector multiplication in the standard PCG algorithm. Numerical analyses on a wide range of problems are carried out to investigate the properties and effectiveness of this parallel DSA algorithm. For most fixed source problems, the HRB(4)-PCG solver achieves the best performance, and the total solution time is reduced by 3 ~ 7 times. For eigenvalue problems, the HRB(1)-PCG is quite efficient, and the total solution time is reduced by 5 ~ 10 times. Moreover, the HRB(m)-PCG has very well parallel scalability. As the number of processors increases, the speed-up factor almost remains unchanged. The combination of the KBA and HRB(m)-PCG DSA algorithms dramatically improves the efficiency of the SN transport calculation. The acceleration method proposed in this study is quite useful for nuclear engineering calculation. Acknowledgements This work is supported partly by the National Natural Science Foundation of China (approved number: 11575028, 11405009 and 11771051) and the Foundation of China Academy of Engineering Physics (2014B0103016). Appendix 1 While deriving Eq. (33), we assume that the flux moment /01 is linear with respect to r. The detailed derivation procedure is as follow



A/01

 i;j;k

  ¼ 12 Ai1=2;j;k /01;i1=2;j;k þAiþ1=2;j;k /01;iþ1=2;j;k h i    ¼ 12 Ai;j;k  DAi /01;i1=2;j;k þ Ai;j;k þ DAi /01;iþ1=2;j;k     ¼ 12 Ai;j;k /01;i1=2;j;k þ/01;iþ1=2;j;k þ 12 DAi /01;iþ1=2;j;k /01;i1=2;j;k   ¼Ai;j;k /01;i;j;k þ 12 pDr i Dhj Dzk /01;iþ1=2;j;k /01;i1=2;j;k   ¼Ai;j;k /01;i;j;k þo r2 ð65Þ

Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi.org/10.1016/j.anucene.2019.107236. References Adams, M.L., Larsen, E.W., 2002. Fast iterative methods for discrete-ordinates particle transport calculations. Prog. Nucl. Energy 40 (1), 3–159. https://doi.org/ 10.1016/S0149-1970(01)00023-3.

13

Alcouffe, R.E., 1977. Diffusion synthetic acceleration methods for the diamonddifferenced discrete-ordinates equations. Nucl. Sci. Eng. 64 (2), 344–355. https://doi.org/10.13182/NSE77-1. Alcouffe, R.E., Baker, R.S., Dahl, J.A., Turner, S.A., Ward, R., 2005. PARTISN: a Time dependent, Parallel Neutral Particle Transport Code System. Los Alamos National Laboratory Report, LA-UR-05-3925. Alcouffe, R.E., Baker, R.S., Dahl, J.A., Turner, S.A., Ward, R., 2005. PARTISN: A TimeDependent, Parallel Neutral Particle Transport Code System. Los Alamos National Laboratory Report, LA-UR-05-3925. Azmy, Y.Y., 1992. Performance and performance modeling of a parallel algorithm for solving the neutron transport equation. J. Supercomput. 6 (3–4), 211–235. https://doi.org/10.1007/BF00155800. Azmy, Y.Y., Dorning, J.J., 1985. Diffusion Synthetic Acceleration of the MultiDimensional Discrete Nodal Transport Method. Advances in Nuclear Engineering Computational Methods, Knoxville, Tennessee, April 9-11, American Nuclear Society, 2, 440–451. Bailey, T.S., Hawkins, W.D., Adams, M.L., 2011. A Piecewise Linear Discontinuous Finite Element Spatial Discretization of the SN Transport Equation for Polyhedral Grids in 3D Cartesian Geometry. Lawrence Livermore National Lab Report, LLNL-PROC-494371. Bando, M., Yamamoto, T., Saito, Y., Takeda, T., 1985. Three-dimensional transport calculation method for eigenvalue problems using diffusion synthetic acceleration. J. Nucl. Sci. Technol. 22 (10), 841–850. https://doi.org/10.1080/ 18811248.1985.9735733. Carlson, B.G., 1953. Solution of the Transport Equation by SN Approximations. Los Alamos Scientific Laboratory Report, LA-1599. Cheng, T.P., Mo, Z.Y., Zhang, G.C., Yan, J., Xu, Q., Fu, Y.G., Deng, L., 2017. Radiation Shielding Calculation Using the Capabilities of Large-Scale Mesh Generation and Efficient Parallel Computing in JSNT on Tens of Thousands of Processors. M&C, Jeju, Korea, April 16-20, 2017. Cheng, Y.X., Zhang, B., Zhang, L., Zheng, J.X., Zheng, Y., Liu, C., 2017b. ARES: a parallel discrete oridinates transport code for radiation shielding and reactor physics. Sci. Technol. Nucl. Installations 2017, 2596727. https://doi.org/10.1155/2017/ 2596727. Courau, T., Moustafa, S., Plagne, L., Poncot, A., 2013. DOMINO: a fast 3D Cartesian discrete ordinates solver for reference PWR simulations and SPN validation. Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo. Davidson, G.G., Evans, T.M., Jarrell, J.J., Hamilton, S.P., 2014. Massively parallel, three-dimensional transport solutions for the k-eigenvalue problem. Nucl. Sci. Eng. 177 (2), 111–125. https://doi.org/10.13182/NSE12-101. Evans, T.M., Stafford, A.S., Slaybaugh, R.N., Clarno, K.T., 2010. DENOVO: a new threedimensional parallel discrete ordinates code in SCALE. Nucl. Technol. 171 (2), 171–200. https://doi.org/10.13182/NT171-171. Gelbard, E.M., Hageman, L.A., 1969. The synthetic method as applied to the SN equations. Nucl. Sci. Eng. 37 (2), 288–298. https://doi.org/10.13182/NSE69A20689. Khattab, K.M., Larsen, E.W., 1991. Synthetic acceleration methods for linear transport problems with highly anisotropic scattering. Nucl. Sci. Eng. 107 (3), 217–227. https://doi.org/10.13182/NSE91-A23786. Koch, K.R., Baker, R.S., Alcouffe, R.E., 1991. Solution of the First-Order Form of the 3D Discrete-Ordinates Equations on a Massively Parallel Machine. Los Alamos National Laboratory Report, LA-UR-91-4157. Larsen, E.W., 1982. Unconditionally stable diffusion synthetic acceleration methods for the slab geometry discrete-ordinates equations. Part I: theory. Nucl. Sci. Eng. 82 (1), 47–63. https://doi.org/10.13182/NSE82-1. Lathrop, K.D., 1969. Spatial differencing of the transport equation: positivity vs accuracy. J. Comput. Phys. 4 (4), 475–498. https://doi.org/10.1016/0021-9991 (69)90015-1. Li, Y.Z., 2013. Advanced Reactor Core Neutronics Computational Algorithms Based on the Variational Nodal and Nodal SP3 Methods. Xi’an Jiaotong University, Xi’an, China. Li, Z., Wu, H., Li, Y., Cao, L., 2017. Block-digonalization of the variational nodal response matrix using the symmetry group theroy. J. Comput. Phys. 351, 230– 253. https://doi.org/10.1016/j.jcp.2017.09.029. Mansur, R.S., Santos, F.P., Filho, H.A., Barros, R.C., 2014. Diffusion synthetic methods for computational modeling of one-speed slab-geometry transport problems with linearly anisotropic scattering. Prog. Nucl. Energy 73, 179–187. https://doi. org/10.1016/j.pnucene.2013.12.014. Martin, N., Hébert, A., 2009. A Three-dimensional SN high-order diamond differencing discretization with a consistent acceleration scheme. Ann. Nucl. Energy 36 (11–12), 1787–1796. https://doi.org/10.1016/j.anucene.2009.08.014. Morel, J.E., 1982. A synthetic acceleration method for discrete oridinates calculations with highly scattering. Nucl. Sci. Eng. 82 (1), 34–46. https://doi. org/10.13182/NSE82-A19026. Reed, W.H., 1971. The effectiveness of acceleration techniques for iterative methods in transport theory. Nucl. Sci. Eng. 45 (3), 245–254. https://doi.org/10.13182/ NSE71-A19077. Rhoades, W.A., Engle Jr, W.W., 1977. A New Weighted-Diffreence Formulation for Discrete-Ordinates Calculations. ANS winter meeting, San Francisco, CA, USA, 27 Nov – 2 Dec, 1977. Rhoades, W.A., Childs, R.L., 1988. The DORT two-dimensional discrete ordinates transport code. Nucl. Sci. Eng. 99 (1), 88–89. https://doi.org/10.13182/NSE88A23547. Rhoades, W.A., Simpson, D.B., 1997. The TORT there-dimensional discrete ordinates neutron/photon transport Code. Comput. Phys. Eng. Div.

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236

14

L. Xu et al. / Annals of Nuclear Energy xxx (xxxx) xxx

Roberts, L., Anistratov, D.Y., 2007. Nonlinear weighted flux methods for particle transport problems. Transp. Theory Stat. Phys. 36 (7), 589–608. https://doi.org/ 10.1080/00411450701703647. Shewchuk, J.R., 1994. An Introduction to the Conjugate Gradient Method without the Agonizing Pain. School of Computer Science. Carnegie Mellon University, Pittsburgh, USA. Sjoden, G.E., Haghighat, A., 1997. PENTRAN – A 3-D Cartesian Parallel SN Code with Angular, Energy and Spatial Decomposition. Proceeding of Joint International Conference: Mathematical Methods and Supercomputing for Nuclear Applications, Saratoga Springs, New York, October 5-9, 1997. Takeda, T., Ikeda, H., 1991. 3-D Neutron transport benchmarks. J. Nucl. Sci. Technol. 28, 656–669. https://doi.org/10.1080/18811248.1991.9731408. Warsa, J.S., Wareing, T.A., Morel, J.E., 2001. Fully Consistent Diffusion Synthetic Acceleration of Linear Discontinuous Transport Discretizations on Three-

Dimensional Unstructured Meshes. Los Alamos National Laboratory Report, LA-UR-01-2674. Xu, L.F., Cao, L.Z., Zheng, Y.Q., Wu, H.C., 2017. Development of a new parallel SN code for neutron-photon transport calculation in 3-D cylindrical geometry. Prog. Nucl. Energy 94, 1–21. https://doi.org/10.1016/j.pnucene.2016.09.005. Xu, L.F., Cao, L.Z., Zheng, Y.Q., Wu, H.C., 2018. Hybrid MPI-communication for the multi-angular SN parallel sweep on 3-D regular grids. Ann. Nucl. Energy 116, 407–416. https://doi.org/10.1016/j.anucene.2018.03.003. Yang, W., Wu, H.C., Li, Y.Z., Cao, L.Z., Wang, S.C., 2018. Acceleration of the exponential function expansion nodal SP3 Method by multi-group GMRES algorithm for PWR pin-by-pin calculation. Ann. Nucl. Energy 120, 869–879. https://doi.org/10.1016/j.anucene.2018.07.005. Zhang, G.C., 2014. Three-Dimensional Deterministic Neutronics Calculation Methods for ITER Test Blanket Module. Xi’an Jiaotong University, Xi’an, China.

Please cite this article as: L. Xu, H. Shen, J. Wei et al., A parallel DSA algorithm in the 3-D cylindrical geometry, Annals of Nuclear Energy, https://doi.org/ 10.1016/j.anucene.2019.107236