A new aggregation algorithm based on coordinates partitioning recursively for algebraic multigrid method

A new aggregation algorithm based on coordinates partitioning recursively for algebraic multigrid method

Accepted Manuscript A new aggregation algorithm based on coordinates partitioning recursively for algebraic multigrid method Jian-ping Wu, Pei-ming Gu...

509KB Sizes 1 Downloads 56 Views

Accepted Manuscript A new aggregation algorithm based on coordinates partitioning recursively for algebraic multigrid method Jian-ping Wu, Pei-ming Guo, Fu-kang Yin, Jun Peng, Jin-hui Yang

PII: DOI: Reference:

S0377-0427(18)30329-7 https://doi.org/10.1016/j.cam.2018.05.052 CAM 11717

To appear in:

Journal of Computational and Applied Mathematics

Received date : 1 March 2017 Revised date : 26 January 2018 Please cite this article as: J. Wu, P. Guo, F. Yin, J. Peng, J. Yang, A new aggregation algorithm based on coordinates partitioning recursively for algebraic multigrid method, Journal of Computational and Applied Mathematics (2018), https://doi.org/10.1016/j.cam.2018.05.052 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Manuscript Click here to view linked References

A NEW AGGREGATION ALGORITHM BASED ON COORDINATES PARTITIONING RECURSIVELY FOR ALGEBRAIC MULTIGRID METHOD JIAN-PING WU

†,

PEI-MING GUO

†,

FU-KANG YIN YANG †

†,

JUN PENG

† , AND

JIN-HUI

Abstract. Aggregation based algebraic multigrid is widely used to solve sparse linear systems, due to its potential to achieve asymptotic optimal convergence and cheap cost to setup. In this kind of method, it is vital to construct coarser grids based on aggregation. In this paper, we provide a new aggregation method based on coordinates partitioning recursively. The adjacent graph of the original coefficient matrix is partitioned into sub-graphs and each sub-graph is recursively partitioned until the minimal number of nodes over the sub-graphs on some level is small enough. In this way, a hierarchy of grids can be constructed from top to bottom, which is completely different from the classical schemes. The results from the solution of model partial differential equations with the preconditioned conjugate gradient iteration show that the new algorithm has better performance and is more robust than the widely used classical algorithms in most cases. Key words. Sparse linear algebraic equations, aggregation based algebraic multigrid, preconditioner, Krylov subspace method, graph partitioning AMS subject classifications.

1. Introduction. The solution of sparse linear system is the kernel of many scientific and engineering computations. But it is very time-consuming. To solve large sparse linear systems, there are many methods, and the Krylov subspace iterations and the multigrid methods are the most efficient ones. When Krylov subspace iterations are used, the convergence rate is determined by the distribution of the eigenvalues of the coefficient matrix. The narrower area the eigenvalues are distributed in, the more convergence rate the related method will be. To accelerate the convergence rate, preconditioning techniques are often used, which transfer the linear system to another with better eigenvalue distribution. The preconditioning process is corresponding to an approximate solution to the original system and the more accurate this process is, the less number of iterations there will be. Multigrid is one of such processes and has the potential asymptotic optimal property. It is one of the best combinations to apply it as the preconditioner of Krylov subspace methods. The efficiency of multigrid methods is determined by the complement of smoothing and correction. The smoothing can reduce the error components with relatively higher frequencies. The rest corresponding to lower frequencies is reduced very slowly and are restricted to a coarser grid to compute the correction [1]. When the smoothing process is given, the efficiency is influenced by two factors, that is, the accuracy of the correction, and the accuracy of the transfer operators. Algebraic multigrid depends mainly on the algebraic information of the coefficient matrix, and exploits geometry information as little as possible. The aggregation based multigrid is one kind of such methods and is widely focused on, for its cheap cost to setup [2]. In these methods, each coarser grid point is related to several finer grid points, which is called an aggregation. When the linear system is given, the † School of Meteorology and Oceanography, National University of Defense Technology, Changsha, China, 410073 ([email protected], [email protected], [email protected], [email protected], [email protected]).

1

2

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

interpolation operator, the restrictor and the coarser grid operator are all determined by the selection of the aggregations only. Therefore, aggregation is the core of this kind of methods. The aggregations can be constructed merely from the adjacent graph. For advectiondiffusion equation, Kim et al. provided a two-point aggregation scheme based on strong connections [3]. In this method, each aggregation contains two points at most, which leads to too many levels in general and may decrease the performance. To reduce the number of levels, the two-point method can be applied two or several times each time to determine the aggregations [4] [5]. Motivated by the preservation of cell structures, Dendy et al. developed a method to coarsen the grid points by a factor of three [6]. And the numerical experiments have demonstrated its robustness. Vanek provided another scheme based on strong coupling [7], where the isolated points are added to some current aggregation as soon as possible, to improve the quality of the grid hierarchy. Kumar provided an algorithm based on graph partitioning for highly discontinuous convection-diffusion problem, with each sub-graph corresponding to an aggregation [8]. With ILU(0) as the smoother and ILUT as the solution scheme for the coarser grid problem, there derived a two-grid method, which is more robust than the classical algebraic multigrids based on strength of connection. Recently, Chen et al. have done experiments for various aggregations and found that those aligned with the anisotropy are the best [9]. From this viewpoint and the consideration of automatic coarsening, they suggested to aggregate according to the eigenvectors associated with small eigenvalues or the minimization of a quantity related to the spectral radius of the iterative matrix. When geometry information under the sparse linear system is known, it can also be used as soon as possible to select the aggregations. For two-dimensional second order elliptic partial differential equation, Braess provided two algorithms [10]. The first is based on seven known geometry structures, and in the second, the nodes are grouped into subsets according to the edge weights, and then every two subsets are aggregated based on the number of edges between them. At the end of the algorithm, each aggregation contains no more than four nodes. Okamoto et al. provided a global coarsening algorithm for the selection of aggregations based on edge coloring for problems derived from finite volume methods [11]. In this algorithm, the independent edges are selected and form a subset. With the ratio of volume to area as a constraint, the volumes related to each element of the subset are fused. Then, aggregations can be derived automatically and will have global sub-optimal shape. The time used for aggregation is increased significantly in general with the increase of the problem size. For linear systems derived from nine-point finite difference method, Deng et al. provided an economic scheme to compute the configuration parameters of aggregations for various number of grid points [12]. In reference [13], several aggregations including the two-point scheme based on strong connection and its variants with two and three loops, the Vanek scheme, the second version of the Braess scheme, and a scheme based on cliques are implemented based on sparse data structures. The experiments show that the two-point scheme and its variants are effective in most cases, and the best scheme is always among them. In this paper, a new aggregation algorithm for linear systems with known discrete grid coordinates is provided, and then is compared to the Braess scheme and the twopoint scheme and its variants to verify its efficiency and robustness. In section 2,

A new aggregation based on coordinates partitioning recursively

3

the preconditioned conjugate gradient iteration is briefly described and some basics about the algebraic multigrids based on aggregation will be provided. In section 3, the new aggregation based on coordinates partitioning recursively and its implementation details will be described. In section 4, some experiment results for linear systems from model partial differential equations are given, and some conclusions are drawn in section 5. 2. Conjugate gradient iteration with algebraic multigrid as the preconditioner. Consider the following symmetric positive definite linear system of order N (2.1)

Ax = b,

where A is a given symmetric positive definite matrix, b is a known vector, and x is the unknown solution. We can use the preconditioned conjugate gradient (PCG) iteration to solve (2.1), which is one of the Krylov subspace methods for symmetric cases. It can be described as in figure 2.1 [14], where M is the so-called preconditioner. Generally speaking, any kind of approximate solution scheme for (2.1) can be selected as a preconditioner. But the specific selection is very sophisticated. On the one hand, the more accurate M approaches A−1 , the more rapidly the algorithm PCG will converge. On the other hand, the computation and storage related to M should be cheap compared to the multiplication of A to any vector of dimension N . Otherwise, though the number of PCG iterations is small, the time used for each iteration is very long, leading to very expensive computation cost. In addition, to exploit PCG iterations properly and efficiently, the preconditioner M should be symmetric. ———————————————————————————————– 1. Compute r (0) = b − Ax(0) 2. Set z (0) = M r (0) and s(0) = z (0) 3. For j = 1, maxIts 4. Compute αj = (z (j) , r (j) )/(As(j) , s(j) ) 5. Compute x(j+1) = x(j) + αj s(j) 6. Compute r (j+1) = r (j) − αj As(j) 7. If ||r (j+1) ||2 /||b||2 < , then stop 8. Compute z (j+1) = M r (j+1) 9. Compute βj+1 = (z (j+1) , r (j+1) )/(z (j) , r (j) ) 10. Compute s(j+1) = z (j+1) + βj+1 s(j) 11. Endfor

———————————————————————————————– Fig. 2.1. Algorithm PCG(Preconditioned CG iteration)

Algebraic multigrid can be used to solve sparse linear system (2.1), and therefore it can be used as the preconditioner M of algorithm PCG. The V-cycle algebraic multigrid is adopted here, which can be described as in figure 2.2. The details are abbreviated and can be referred to [15]. In figure 2.2, m is the maximal number of levels allowed and l is the label of the current level. The approximate solution is smoothed with the smoother SlR first. If the derived approximation with SlR is denoted as x(l) , the residual vector r (l) = b(l) − Al x(l) is mapped to the coarser grid by the operator Rl+1,l . On the (l + 1)-th grid, algorithm AMG is invoked again to derive the correction vector. And then it is mapped back to the l-th grid, to correct the approximation x(l) . If the number of levels attains the threshold m, or the sparse linear system is small enough, the recursive process is stopped and the correction on the coarsest level is solved with a certain method.

4

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

———————————————————————————————– 1. If (l = m or the order of Al is small enough) then 2. Compute x(l) = A−1 b(l) l 3. Else 4. Compute x(l) = SlR (x(l) , b(l) ) 5. Compute d(l+1) = Rl+1,l (b(l) − Al x(l) ) 6. Set v (l+1) = 0 7. Compute v (l+1) =MG(v (l+1) , d(l+1) , l + 1) 8. Compute x(l) = x(l) + Pl,l+1 v (l+1) 9. Compute x(l) = SlL (x(l) , b(l) ) 10. Endif

———————————————————————————————– Fig. 2.2. Algorithm AMG(V-cycle multigrid x(l) =MG(x(l) , b(l) , l))

If the smoother S is defined as S(x, b) := (I − SA)x + Sb,

(2.2)

and m = 2, that is, two-levels are used, algorithm AMG is equivalent to the following iteration e(l) := Bl e(l) ,

(2.3) where

Bl = (I − SlL Al ){I − Pl,l+1 (Rl+1,l Al Pl,l+1 )−1 Rl+1,l Al }(I − SlR Al ),

(2.4) (l)

(l)

and e = x − x is the error vector. When we construct the preconditioner Ml based on the iteration (2.3), we have the relation Bl = I − Ml Al , thus Ml = (SlL + SlR ) + Ol − SlL Al SlR + SlL Al Ol Al SlR − (SlL Al Ol + Ol Al SlR ), where Ol = Pl,l+1 (Rl+1,l Al Pl,l+1 )−1 Rl+1,l . Obviously, to ensure that algorithm AMG can be used as a preconditioner of algorithm PCG, the preconditioner Ml should be symmetric. Therefore, we can select Rl+1,l = (Pl,l+1 )T , and SlL = (SlR )T . Then, to apply algorithm AMG, the remaining work is to determine the coarser grid, the prolong operator Pl,l+1 from the coarser grid to the finer, the coefficient matrix of the linear system on the coarser level Al+1 , and the smoother SlR on the finer level. In aggregation based algebraic multigrid methods, the coarser grids are constructed from aggregating the points on the finer grid. In this paper, a new aggregation algorithm based on coordinates partitioning recursively will be given in section 3 and validated in section 4. The basic prolong operator is considered here only. If there are nl points on the l-th grid, the prolong operator from the (l + 1)-th grid to the l-th grid is an nl × nl+1 matrix. If some point i in the l-th grid belongs to the j-th aggregation, it is mapped to the j-th point in the (l + 1)-th level, and the element of Pl,l+1 on the i-th row and the j-th column is 1. All elements not equal to 1 are zeros. The smoother can be derived from any iteration with cheaper cost. For Jacobi iteration, we have (2.5)

x(k+1) := (I − D−1 A)x(k) + D−1 b,

where D is a diagonal matrix with each diagonal element equal to the corresponding element of A. Compared (2.5) with (2.2), it is known that (2.5) is equivalent to (2.2)

A new aggregation based on coordinates partitioning recursively

5

with S = D−1 . We know that D is a diagonal matrix with each diagonal element positive, thus S is symmetric. Therefore, we can select SlR = Dl−1 and SlL = Dl−1 . For Gauss-Seidel, we have (2.6)

x(k+1) := (I − (D − L)−1 A)x(k) + (D − L)−1 b,

where L is a matrix, with the strictly lower triangular part same as that of A, and other elements zeros. Thus, when applied as the smoother of algorithm AMG, we can select SlR = (Dl − Ll )−1 and SlL = (Dl − Ll )−T . When solving linear systems with algorithm PCG preconditioned by multigrid scheme, on each iteration, the grid hierarchy, Al , Pl,l+1 , and SlR are all not changed. Therefore, they can be determined during the setup, and be used directly in the latter iterations. 3. A New aggregation based on coordinates partitioning recursively. It is known that the geometry multigrid is more efficient than algebraic ones for isotropic problems from structured grids, which is due to the preservation of grid shape during the coarsening process to a great extent. Therefore, if the coordinate information of the discrete grid related to linear system (2.1) is known, can we construct algebraic multigrid methods with this kind of property? In this section, such a kind of method for aggregation will be provided. (0) For linear system (2.1), denote the adjacent graph of A with G1 , and the x, y, (0) and z coordinate of node i of G1 with xi , yi , and zi respectively. For the matrix (0) A considered is symmetric, G1 is an un-directed graph. Now we can partition the (0) (1) graph G1 into m0 sub-graphs Gi ,i = 1, 2, . . . , m0 , where m0 can be given as an (1) input parameter beforehand. The vertexes in each Gi can be aggregated into a single point in the coarsest level and there are m0 nodes in the coarsest level. (1) For each Gi , it can be partitioned again into p sub-graphs. For there are m0 sub-graphs in the 1-st level, they can be partitioned into pm0 sub-graphs in all, with each corresponding to a node in level 2. The partitioning process can be continued recursively until some sub-graph in a certain level l is small enough. Each sub-graph on the l-th level can be regarded as an aggregation of the corresponding nodes from the original graph. The details can be described as in figure 3.1. (j) (j) (j+1) (j) Denote the vertex set of Gi with Vi , we have Vi+(k−1)p ⊂ Vk for i = 1, 2, . . . , p, which means that a grid hierarchy is generated naturally. There are m0 nodes in the coarsest level, and for level 2 ≤ j ≤ l − 1, there are m0 pj−1 nodes, with each node on level j related to p nodes on the (j − 1)-the level. And for level l, the (l) (l) (0) sub-graph Gi contains |Gi | nodes, corresponding to those of G1 . To record the grid hierarchy, on output of algorithm ACRP, three arguments are provided, that is, the number of levels l, array f and g. Array f contains the parent on the coarser level for each node. If the number of nodes for all the previous levels is denoted as γ, and the current level number is j, fγ+(k−1)p+i denotes the parent of (j) the i-th part of sub-graph Gk . In addition, gj records the start position of the nodes on level j stored in f . At this time, we can construct the aggregations, from level l to level 1 step by step. In algorithm ACRP, the partitioning is a core process. In this paper, we adopt the algorithm PARTCO, which is based on coordinate information and can be described as in figure 3.2. In algorithm PARTCO, each sub-graph is partitioned into p parts, such that each part is as cubic as possible.

6

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

———————————————————————————————– 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

(0)

(1)

Partition G1 into m0 sub-graphs Gi ,i = 1, 2, . . . , m0 For i = 1, 2, . . . , m0 set fi = 1 (1) Set α(1) =min{|Gi | : i = 1, 2, . . . , m0 } Set l = 1 and g1 = 1, g2 = m0 + 1 While(α(l) > p) do Set l = l + 1 and ml−1 = pml−2 Set gl+1 = gl + ml−1 and γ = gl+1 − 1 For k = 1 to ml−2 do (l−1) (l) Partition Gk into p sub-graphs G(k−1)p+i ,i = 1, 2, . . . , p For i = 1, 2, . . . , p set fγ+(k−1)p+i = k Enddo (l) Set α(l) =min{|Gi | : i = 1, 2, . . . , ml−1 } Enddo Set l = l + 1, gl+1 = gl + n and γ = gl+1 − 1 For k = 1 to ml−2 do (l−1)

16. For i ∈ Gk 17. Enddo

set fγ+i = k

———————————————————————————————– Fig. 3.1. Algorithm ACRP(aggregation based on coordinates partitioning recursively). Input: (0) G1 , m0 , p and xi , yi , zi , i = 1, 2, . . . , n. Output: l, array f and g.

———————————————————————————————– 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Find the number nx of different x coordinates. Find the number ny of different y coordinates. Find the number nz of different z coordinates. For px = [{pnx nx /(ny nz )}1/3 + 1/2] to 1 step -1 do Set pt = [p/px ] If(px pt = p) break Enddo For py = [(ny pt /nz )1/2 + 1/2] to 1 step -1 do Set pz = [pt /py ] If(pz py = pt ) break Enddo Partion x, y, z into px , py , pz blocks respectively Determine the block number ui ∈ [0, px − 1] for each x(i) Determine the block number vi ∈ [0, py − 1] for each y(i) Determine the block number wi ∈ [0, pz − 1] for each z(i) For i = 1, 2, . . . , |G| set hi = (wi − 1)px py + (vi − 1)px + ui

———————————————————————————————– Fig. 3.2. Algorithm PARTCO(Partitioning a graph according to coordinates). Input: p and xi , yi , zi , i = 1, 2, . . . , |G|. Output: array h, where hi denotes the sub-graph to which node i belongs.

4. Numerical results. In this section, all the experiments are done on a processor of Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (cache 20480 KB). The operating system is Red Hat Linux 2.6.32-279-aftms-TH and the compiler is Intel FORTRAN Version 11.1. For the coefficient matrices of the derived linear systems are symmetric positive definite in all the experiments, algorithm PCG is used for all the cases. The initial approximation solution is selected as the zero vector and the stop criterion is selected as =1E-10. When the multigrid method is used as the preconditioner, the linear system on the coarsest level is solved with ILU(0) preconditioned conjugate gradient and the stop criterion is selected as =1E-10 too. 4.1. Numerical experiments for structured problems. In this subsection, experiments are done for seven sparse linear systems derived from PDE problems on

A new aggregation based on coordinates partitioning recursively

7

structured grids, namely LinSys21, LinSys22, LinSys23, LinSys24, LinSys31, LinSys32 and LinSys33. They are derived with finite difference methods. The first four are derived from a two-dimensional PDE problem with Dirichlet boundaries (4.1)

−a1

∂ ∂u ∂ ∂u (ρ ) − a2 (ρ ) + δu = f, ∂x ∂x ∂y ∂y

where a1 and a2 are constants. The definition region is (0, c)×(0, c) and the function f and the boundary values are given from a known true solution u = 1. There are n + 2 points in each dimension and the value u(xi , yj ) is defined as ui,j for any continuous function u, where xi = ih, yj = jh, i, j = 0, 1, . . . , n + 1, and h = c/(n + 1). To solve (4.1) numerically, we apply central difference scheme for the 2D test cases except LinSys23, leading to ∂u

−a1

∂u

∂u (ρ ∂y )i,j+1/2 − (ρ ∂y )i,j−1/2 (ρ ∂u ∂x )i+1/2,j − (ρ ∂x )i−1/2,j − a2 + (δu)i,j = fi,j . h h

Apply central difference scheme to each differential term of the above equation again, we can derive the following discrete form (4.2)

−a2 ρi,j+1/2 ui,j+1 − a1 ρi+1/2,j ui+1,j −a1 ρi−1/2,j ui−1,j − a2 ρi,j−1/2 ui,j−1 + λi,j ui,j = h2 fi,j ,

with ui,j = 1 on the boundaries, where λi,j = δi,j h2 + a2 ρi,j+1/2 + a1 ρi+1/2,j + a1 ρi−1/2,j + a2 ρi,j−1/2 . For LinSys21, a1 = a2 = ρ = 1, δ = 0 and the discrete form (4.2) is reduced to −ui,j−1 − ui−1,j + 4ui,j − ui+1,j − ui,j+1 = h2 fi,j . For LinSys22, a1 = a2 = 1 and the equation (4.2) is reduced to −ρi,j+1/2 ui,j+1 − ρi+1/2,j ui+1,j − ρi,j−1/2 ui,j−1 − ρi−1/2,j ui−1,j + λi,j ui,j = h2 fi,j , where ρ = ρk and δ = δk when (x, y) in Rk for k = 1, 2 and 3, and R1 = {(2, 2.1] × (1, 2.1]} ∪ {(2, 2.1] × (1, 2.1]}, R2 = (1, 2) × (1, 2), R3 = {[0, 2.1] × [0, 1)} ∪ {[0, 1) × [0, 2.1]}, and ρ1 = 1, ρ2 = 2 × 103 , ρ3 = 3 × 105 , δ1 = 0.02, δ2 = 3, δ3 = 500. For LinSys23, ρ = 1, δ = 0, a1 = a2 = 1, and the equation (4.1) is simplified to −

∂ 2u ∂ 2 u − 2 = f. ∂x2 ∂y

8

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

We can use the values on the point (xi , yj ) and the other eight points surrounding it to approximate the left-hand side. Through Taylor expansions and some simplifications, the following high-precision discrete form can be derived −ui−1,j−1 − 4ui,j−1 − ui+1,j−1 − 4ui−1,j + 20ui,j − 4ui+1,j −ui−1,j+1 − 4ui,j+1 − ui+1,j+1 = h2 fi,j . For LinSys24, ρ = 1, δ = 0, a1 = 1,a2 = 100, and the discrete form (4.2) is reduced to −100ui,j−1 − ui−1,j + 202ui,j − ui+1,j − 100ui,j+1 = h2 fi,j . LinSys31 to LinSys33 are derived from a 3D partial differential equation problem with Dirichlet boundaries −b1

(4.3)

∂2u ∂2u ∂2u − b − b = f, 2 3 ∂x2 ∂y 2 ∂z 2

The definition region is (0, 1) × (0, 1) × (0, 1) and f and the boundary values are given from a known true solution u = 1. There are n + 2 points in each dimension and u(xi , yj , zk ) is denoted as ui,j,k for any function u, where xi = ih, yj = jh, zk = kh for i, j, k = 0, 1, . . . , n + 1, and h = 1/(n + 1). For LinSys31 and LinSys32, apply central difference scheme, we have ∂u ( ∂u ui+1,j,k + ui−1,j,k − 2ui,j,k ∂2u ∂x )i+1/2,j,k − ( ∂x )i−1/2,j,k = = 2 ∂x h h2

and similar approximations for the other two left-hand terms of equation (4.3), we can derive the discrete form (4.4)

−b1 ui−1,j,k − b2 ui,j−1,k − b3 ui,j,k−1 + (2b1 + 2b2 + 2b3 )ui,j,k −b1 ui+1,j,k − b2 ui,j+1,k − b3 ui,j,k+1 = h2 fi,j,k .

For LinSys31, b1 = b2 = b3 = 1, and the equation (4.4) is reduced to −ui−1,j,k − ui,j−1,k − ui,j,k−1 + 6ui,j,k − ui+1,j,k − ui,j+1,k − ui,j,k+1 = h2 fi,j,k . For LinSys32, b1 = 1, b2 = 10, b3 = 100, and the equation (4.4) is reduced to −ui−1,j,k − 10ui,j−1,k − 100ui,j,k−1 + 222ui,j,k −ui+1,j,k − 10ui,j+1,k − 100ui,j,k+1 = h2 fi,j,k . For LinSys33, b1 = b2 = b3 = 1, and apply similar schemes to LinSys23, we can derive −

i+1 X

j+1 X

k+1 X

ui0 ,j 0 ,k0 + 27ui,j,k = h2 fi,j,k .

i0 =i−1 j 0 =j−1 k0 =k−1

The results are given in figure 4.1 and table 4.2 to 4.8. For convinience some acronyms are used, which are described in table 4.1, where Cgrd is defined as the ratio of the total number of grid points on each level to that on the finest level, and Cops is defined as the ratio of the total number of non-zeros in the coefficient matrices on each level to that on the finest level. In the tests for the new scheme

A new aggregation based on coordinates partitioning recursively

9

Table 4.1 Acronyms used and their descriptions

Condition number of the preconditioned matrix MA

Acronym ACRP2 ACRP4 ACRP8 ASTR2 ASTR4 ASTR8 AGGBR Mthd Nlev Cgrd Cops Its Stm Itm Ttm

description New scheme ACRP with each sub-graph partitioned into 2 parts each time New scheme ACRP with each sub-graph partitioned into 4 parts each time New scheme ACRP with each sub-graph partitioned into 8 parts each time The two-point basic scheme based on strongest connections The scheme with two loops of ASTR2(with four points aggregated at most) The scheme with three loops of ASTR2(with eight points aggregated at most) The second scheme described in reference [10] Method The number of levels of the multigrid The complexity of the grids The complexity of the operators The number of PCG iterations used The setup time in seconds The time used for iteration in seconds The total computation time, that is, the summation of Stm and Itm

250 225 200

ACRP2(Jac) ASTR2(Jac) ACRP2(GS) ASTR2(GS)

175 150 125 100 75 50 25 0 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Number of points in one dimension(n)

Fig. 4.1. Condition number of the preconditioned matrix M A for LinSys21, where Jac and GS are related to the Jacobi and Gauss-Seidel smoothers respectively. It shows that for Jacobi smoothing, the condition number from the new scheme ARCP2 is much smaller than the classical one ASTR2. And for Gauss-Seidel smoothing, the condition number from ARCP2 is comparable to that from ASTR2.

ARCP, m0 = 100 is used except in figure 4.1, where the size of the problem is much smaller and m0 = 16 is used instead. From figure 4.1 and table 4.2, it is clear that, for LinSys21, which is an isotropic problem, when Jacobi smoothing is used, both the iterative and the whole performance of the new scheme ACRPp are much better than that of the classical scheme ASTRp. When Gauss-Seidel smoothing is used, the iterative performance of ACRPp is comparable to that of ASTRp. Due the longer setup time, the total computation time of ACRPp is slightly larger than that of ASTRp. From table 4.3, we can see that for LinSys22, which is a discontinuous problem, when Jacobi smoothing is used, the new scheme ACRPp is much better than ASTRp again. When Gauss-Seidel smoothing is used, both the iterative and the whole per-

10

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

Table 4.2 Results from LinSys21(2D isotropic problem). When Jacobi smoothing is used, Its, Itm and Ttm of the new scheme ACRPp are all by far less than that of the classical scheme ASTRp, which means that the performance of the new scheme is much better. And when Gauss-Seidel smoothing is used, the iterative performance of the new scheme ACRPp is comparable to that of ASTRp.

n

512

1024

Mthd ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR

Cgrd 1.7809 1.9998 1.5207 1.3333 1.2232 1.1428 1.3333 1.7812 1.9999 1.5208 1.3333 1.4464 1.1429 1.3333

Cops 1.8197 1.9975 1.5196 1.3322 1.2299 1.1422 1.3322 1.8214 1.9988 1.5202 1.3328 1.4510 1.1425 1.3328

Nlev 12 13 7 7 5 5 7 14 15 8 8 6 6 8

Stm 1.28 0.51 0.94 0.50 0.76 0.48 16.6 6.88 1.99 5.15 1.96 4.65 1.89 263.

Its 66 319 69 354 75 355 354 92 578 94 655 103 658 655

Jacobi Itm 4.49 23.8 4.10 19.0 3.76 17.3 15.0 25.2 173. 22.3 141. 23.5 128. 115.

Ttm 5.77 24.3 5.04 19.5 4.52 17.8 31.6 32.1 175. 27.5 143. 28.2 130. 378.

Gauss-Seidel Its Itm Ttm 60 5.08 6.36 61 5.70 6.21 63 4.53 5.47 66 4.18 4.68 69 4.21 4.97 71 4.13 4.61 66 3.50 20.1 83 28.1 35.0 85 31.9 33.9 86 25.0 30.2 92 23.2 25.2 94 26.3 31.0 110 26.1 28.0 92 19.8 283.

Table 4.3 Results from LinSys22(2D discontinuous problem). When Jacobi smoothing is used, the new scheme ACRPp is much better than the classical scheme ASTRp. And when Gauss-Seidel smoothing is used, the new scheme ACRPp is slightly better than ASTRp.

n

512

1024

Mthd ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR

Cgrd 1.7809 1.9998 1.5207 1.3333 1.2232 1.1429 1.3334 1.7812 2.0000 1.5208 1.3334 1.4464 1.1429 1.3334

Cops 1.8197 2.0257 1.5196 1.3418 1.2299 1.1580 1.3362 1.8214 2.0270 1.5202 1.3424 1.4510 1.1583 1.3355

Nlev 12 13 7 7 5 5 7 14 15 8 8 6 6 8

Stm 1.28 0.51 0.94 0.50 0.75 0.48 16.6 6.87 2.03 5.15 1.97 4.67 1.89 263.

Its 66 282 68 470 78 498 386 90 463 94 794 106 772 705

Jacobi Itm 4.53 21.1 4.01 25.6 3.91 24.0 16.5 24.6 141. 22.3 170. 24.3 151. 123.

Ttm 5.81 21.6 4.95 26.1 4.67 24.5 33.1 31.5 143. 27.5 172. 28.9 153. 386.

Gauss-Seidel Its Itm Ttm 57 4.86 6.15 64 5.95 6.46 60 4.34 5.28 75 4.78 5.28 70 4.25 5.00 85 5.06 5.54 76 3.95 20.6 79 26.9 33.7 92 34.9 36.9 83 24.2 29.3 117 29.5 31.5 95 26.6 31.3 137 32.8 34.7 106 22.7 285.

formance of ACRPp are slightly better than that of ASTRp. From table 4.4, it can be seen that for LinSys23, which has stronger connectivity along axis directions, the iterative performance of the new scheme ACRPp is comparable to that of ASTRp but the total computation time is slightly longer. From table 4.6, it is clear that for LinSys31, which is a three-dimensional isotropic problem, when Jacobi smoothing is used, the performance of the new scheme ACRPp is slightly better. And when Gauss-Seidel smoothing is used, the new scheme is much better. From table 4.8, we can see that for LinSys33, which is a three-dimensional isotropic problem with more connections, the performance of the new scheme is significantly better. From table 4.5, it can be seen that for LinSys24, which is a two-dimensional problem with strong anisotropic property along different axes, when Jacobi smoothing is used, the new scheme is much better again. But when Gauss-Seidel smoothing is

A new aggregation based on coordinates partitioning recursively

11

Table 4.4 Results from LinSys23(2D problem with stronger connectivity along axes). The iterative performance of the new scheme ACRPp is comparable to that of the classical scheme ASTRp but the total computation time is slightly longer.

n

512

1024

Mthd ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR

Cgrd 1.7809 1.9998 1.5207 1.3333 1.2232 1.1428 1.3333 1.7812 1.9999 1.5208 1.3333 1.4464 1.1429 1.3333

Cops 3.1536 3.5890 2.7311 2.3943 2.1897 2.0530 2.3943 3.1592 3.5946 2.7343 2.3972 2.5960 2.0551 2.3972

Nlev 12 13 7 7 5 5 7 14 15 8 8 6 6 8

Stm 1.65 0.85 1.22 0.81 0.95 0.76 21.5 8.50 3.40 6.37 3.18 5.71 3.02 284.

Its 65 62 68 65 73 77 65 91 86 92 91 101 116 91

Jacobi Itm 6.86 7.50 6.34 5.53 5.50 5.66 4.42 38.8 41.5 34.3 31.1 35.9 35.1 25.1

Ttm 8.51 8.35 7.56 6.34 6.45 6.41 25.9 47.3 44.9 40.7 34.3 41.6 38.2 309.

Gauss-Seidel Its Itm Ttm 60 7.98 9.62 61 8.56 9.41 63 7.34 8.56 66 6.92 7.73 68 6.35 7.30 71 6.06 6.82 66 5.26 26.7 82 43.6 52.1 84 47.7 51.1 86 39.8 46.1 91 38.3 41.5 93 41.1 46.8 109 38.2 41.3 91 29.2 313.

Table 4.5 Results from LinSys24(2D problem with strong anisotropic along different axes). When Jacobi smoothing is used, the new scheme ACRPp is much better. But when Gauss-Seidel smoothing is used, the classical scheme ASTRp is much better.

n

512

1024

Mthd ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR

Cgrd 1.7809 1.9998 1.5207 1.3333 1.2232 1.1428 1.3333 1.7812 1.9999 1.5208 1.3333 1.4464 1.1429 1.3333

Cops 1.8197 1.9975 1.5196 1.3322 1.2299 1.1422 1.3322 1.8214 1.9988 1.5202 1.3328 1.4510 1.1425 1.3328

Nlev 12 13 7 7 5 5 7 14 15 8 8 6 6 8

Stm 1.27 0.42 0.94 0.39 0.75 0.38 16.6 6.87 1.60 5.15 1.56 4.65 1.49 263.

Its 182 319 171 354 196 355 549 243 578 234 655 258 658 930

Jacobi Itm 12.6 18.7 10.0 15.0 9.79 13.7 23.6 66.4 139. 55.3 113. 58.7 102. 163.

Ttm 13.9 19.1 10.9 15.4 10.5 14.0 40.3 73.3 140. 60.5 114. 63.3 104. 426.

Gauss-Seidel Its Itm Ttm 133 11.1 12.5 61 4.41 4.83 125 8.85 9.79 66 3.29 3.68 139 8.56 9.31 71 3.31 3.69 127 6.71 23.4 179 60.9 67.7 85 25.5 27.1 171 49.5 54.7 92 18.6 20.2 187 52.2 56.8 110 20.8 22.3 175 37.5 300.

used, the classical scheme ASTRp is much better. From table 4.7, it is clear that for LinSys32, which is a three-dimensional problem with strong anisotropic property along different axes, the classical scheme ASTRp is better again, but the privileges over ACRPp is weakened. From the above discussions, we can see that the new scheme is significantly better for isotropic problems and most other cases. Only for problems with strong anisotropic property along different axes, the new scheme may be poorer than the classical schemes based on strong connections. In addition, from the listed results, it is clear that the Braess scheme can achieve much better iterative performance but its setup time is too long, leads to poor whole performance in general. 4.2. Numerical experiments for unstructured problems. In this subsection, numerical experiments are done for two sparse linear systems, Lshape and Refine,

12

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

Table 4.6 Results from LinSys31(3D isotropic problem). When Jacobi smoothing is used, the performance of the new scheme ACRPp is slightly better. And when Gauss-Seidel smoothing is used, the new scheme ACRPp is much better.

n

64

128

Mthd ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR

Cgrd 1.3902 1.9998 1.5207 1.3333 1.2232 1.1428 1.3333 1.3906 2.0000 1.2604 1.3333 1.2232 1.1429 1.3333

Cops 1.9205 2.7520 2.1373 1.8378 1.6879 1.5774 1.8374 1.9372 2.7755 1.7544 1.8519 1.6998 1.5884 1.8516

Nlev 11 13 7 7 5 5 7 14 16 8 9 6 6 9

Stm 0.86 0.70 0.74 0.70 0.54 0.68 16.73 8.58 5.65 5.96 5.54 5.07 5.23 1049

Its 76 52 44 57 77 60 58 108 81 117 91 114 105 94

Jacobi Itm 4.44 5.04 2.78 3.96 3.95 3.73 3.26 50.4 63.8 50.3 51.2 47.9 52.2 42.4

Ttm 5.30 5.74 3.52 4.66 4.49 4.41 20.0 59.0 69.4 56.3 56.7 52.9 57.4 1091

Gauss-Seidel Its Itm Ttm 24 1.66 2.52 24 2.96 3.66 27 2.06 2.80 28 2.30 3.00 29 1.82 2.35 29 2.27 2.96 30 2.07 18.8 32 17.9 26.5 32 31.9 37.6 37 19.0 24.9 38 25.2 30.7 41 20.5 25.5 39 24.1 29.3 45 25.7 1075

Table 4.7 Results from LinSys32(3D problem with strong anisotropic along different axes). The iterative performance of the new scheme ACRPp is worser than that of the classical scheme ASTRp.

n

64

128

Mthd ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR

Cgrd 1.3902 1.9998 1.5207 1.3333 1.2232 1.1428 1.3333 1.3906 2.0000 1.2604 1.3333 1.2232 1.1429 1.3333

Cops 1.9205 2.7361 2.1373 1.8285 1.6879 1.5714 1.8364 1.9372 2.7671 1.7544 1.8470 1.6998 1.5853 1.8511

Nlev 11 13 7 7 5 5 7 14 16 8 9 6 6 9

Stm 0.86 0.55 0.74 0.56 0.54 0.54 16.7 8.64 4.51 5.98 4.55 5.07 4.36 1050

Its 85 60 88 62 85 65 85 131 89 137 92 126 99 126

Jacobi Itm 4.92 4.63 5.58 3.43 4.52 3.18 4.71 61.5 55.3 59.2 41.3 53.3 39.3 57.0

Ttm 5.77 5.17 6.33 3.99 5.05 3.73 21.4 70.1 59.8 65.2 45.8 58.3 43.6 1107

Gauss-Seidel Its Itm Ttm 50 3.60 4.45 31 2.85 3.40 51 4.03 4.78 38 2.56 3.12 54 3.44 3.98 44 2.51 3.05 56 3.63 20.4 58 33.5 42.1 42 31.2 35.7 66 34.9 40.9 52 28.7 33.2 69 35.6 40.7 57 26.6 31.0 77 40.9 1091

which are derived from a two-dimensional Poisson equation (4.5)



∂ 2u ∂ 2 u − 2 = f, ∂x2 ∂y

on unstructured grids with Dirichlet boundary condition. The discretizations are based on finite element method, where the piecewise linear basis function and triangular elements with three nodes are used. The right-hand function f and the boundary values are set with a true solution u(x, y) = sin(πx)sin(πy) + x. The linear system Lshape is derived on an L-shape grid, described as in figure 4.2, where the definition region is [0, 1]2 \(1/4, 1]2. In figure 4.2, the step on each dimension is h = 1/40 for clarity. And two cases with larger scale are experimented, with h = 1/512 and h = 1/1024 respectively. For the case h = 1/512, there are 115713

A new aggregation based on coordinates partitioning recursively

13

Table 4.8 Results from LinSys33(3D isotropic problem with more connections). The performance of the new scheme ACRPp is significantly better.

n

64

128

Mth ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 AGGBR

C(grid) 1.3902 1.9998 1.5207 1.3333 1.2232 1.1428 1.3333 1.3906 2.0000 1.2604 1.3333 1.2232 1.1429 1.3333

C(oper) 7.1861 10.356 7.8151 6.9305 6.3740 5.9591 6.9270 7.3338 10.574 6.6726 7.0626 6.4874 6.0631 7.0607

Nlev 11 13 7 7 5 5 7 14 16 8 9 6 6 9

Stm 2.14 2.51 2.01 2.33 1.42 2.13 17.6 20.7 20.8 14.8 19.0 13.0 17.7 1059

Jacobi Itm 4.35 7.02 5.26 5.87 4.67 4.81 4.96 49.1 81.3 52.9 67.2 55.4 57.8 62.9

Its 25 23 28 28 30 26 30 34 33 40 39 43 38 46

Ttm 6.48 9.53 7.28 8.20 6.09 6.94 22.6 69.7 102. 67.7 86.2 68.4 75.6 1122

Its 22 21 24 26 27 26 27 30 30 35 36 38 37 40

Gauss-Seidel Itm Ttm 4.89 7.03 7.66 10.2 5.79 7.80 6.99 9.31 5.35 6.76 5.69 7.83 5.34 23.0 55.1 75.8 89.0 110. 58.7 73.5 79.1 98.1 62.0 75.0 66.6 84.3 65.3 1124

nodes and 229376 elements in all. Except the boudary points with known solution, there are 113665 unkowns. For the case h = 1/1024, there are 460801 nodes and 917504 elements in all. And there are 456705 unkowns. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 4.2. Skematic diagram of the discrete grid of an L-shape region [0, 1]2 \(1/4, 1]2 . Uniform triangulation is used and the discrete step in each dimension is h = 1/40.

The linear system Refine is derived on a hybrid resolution grid, described as in figure 4.3, where the definition region is [0, 1]2 and a refined region [1/4, 3/4]2 is triangulated with higher resolution. The refined boxes and the others are bridged with interface boxes. Each interface box is partitioned into three triangular elements and the other boxes in the refined region are derived with step h/2, where h is 1/(n − 1) and n is the number of points in each dimension when no refinement is used. In figure 4.3, the argument n = 25. And two larger cases are experimented again, with n = 257 and n = 513 respectively. For the case n = 257, there are 113929 nodes and 226832 elements in all. The number of unknowns is 112905. For the case n = 513, there are

14

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

Table 4.9 Results from Lshape(uniform triangulation of an L-shape region). The performance of the new scheme ACRPp is significantly better.

1/h

512

1024

Mth ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8

C(grid) 1.2883 1.9998 1.0961 1.3334 1.0824 1.1429 1.2870 2.0000 1.0957 1.3334 1.0205 1.1429

C(oper) 1.2754 1.9950 1.0939 1.3312 1.0809 1.1414 1.2766 1.9976 1.0946 1.3323 1.0201 1.1421

Nlev 15 12 8 7 6 5 17 14 9 8 6 6

Stm 0.78 0.35 0.46 0.33 0.38 0.33 4.59 1.28 2.88 1.21 2.30 1.18

Jacobi Itm 2.37 15.5 2.90 10.8 2.77 9.85 9.90 111. 15.4 79.9 36.1 72.4

Its 111 377 161 445 156 448 114 691 209 805 529 809

Ttm 3.15 15.8 3.37 11.1 3.15 10.2 14.5 112. 18.3 81.1 38.4 73.6

Gauss-Seidel Its Itm Ttm 48 1.49 2.27 48 2.34 2.69 55 1.43 1.90 58 1.81 2.15 63 1.62 2.00 71 1.91 2.23 65 8.12 12.7 66 12.8 14.0 75 7.95 10.8 80 10.1 11.3 89 8.75 11.0 89 9.66 10.8

457225 nodes and 912400 elements in all. And there are 455177 unkowns. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.2

0.4

0.6

0.8

1

Fig. 4.3. Skematic diagram of the hybrid resolution grid for n = 25 on [0, 1]2 . The refined region is [1/4, 3/4]2 , including the interfaces. Each interface box is partitioned into three triangular elements and the other boxes in the refined region are derived with half discrete step of the non-refined region.

In this subsection, m0 = 2 is used for the new scheme ARCP and the other arguments are the same as in subsection 4.1. From table 4.9 and table 4.10, it is clear that, for both the problems on L-shape region and those on hybrid resolution grid, whenever Jacobi or Gauss-Seidel smoothing is used, the iterative and the whole performance of the new scheme ACRPp are much better than that of the classical scheme ASTRp respectively. 5. Conclusion. We provide a new aggregation method from top to bottom, which is completely different from the classical schemes. The grid hierarchy is constructed by graph partitioning recursively, and the partitioning is based on coordinate information. The results from the solution of model partial differential equations from some structured and two unstructured grids show that the new algorithm has better

A new aggregation based on coordinates partitioning recursively

15

Table 4.10 Results from Refine(hybrid resolution grid). The performance of the new scheme ACRPp is significantly better again.

n

257

513

Mth ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8 ACRP2 ASTR2 ACRP4 ASTR4 ACRP8 ASTR8

C(grid) 1.2902 1.9998 1.0967 1.3334 1.0829 1.1430 1.2880 2.0000 1.0960 1.3334 1.1645 1.1429

C(oper) 1.2787 1.9969 1.0954 1.3321 1.0820 1.1421 1.2788 1.9986 1.0953 1.3327 1.1640 1.1425

Nlev 15 12 8 7 6 5 17 14 9 8 7 6

Stm 0.80 0.35 0.46 0.34 0.38 0.33 4.74 1.28 2.93 1.21 2.53 1.18

Its 124 427 219 412 183 458 127 708 236 765 198 773

Jacobi Itm 2.72 17.2 4.05 9.98 3.34 10.0 11.3 113. 17.9 75.4 15.9 68.9

Ttm 3.52 17.5 4.51 10.3 3.72 10.3 16.1 115. 20.9 76.6 18.4 70.1

Gauss-Seidel Its Itm Ttm 66 1.92 2.72 83 4.15 4.50 80 1.95 2.42 104 3.08 3.42 91 2.19 2.57 131 3.62 3.94 90 10.6 15.3 120 23.8 25.1 110 11.0 13.9 154 18.5 19.7 125 13.2 15.8 179 20.0 21.2

performance and is more robust than the widely used classical algorithms in most cases, especially for approximately isotropic problems. For anisotropic problems, the partitioning scheme used in this paper does not exploit the information of the matrix entries yet, leading to performance degradation, which may be improved by more sophisticated partitioning schemes. The partitioning scheme in this paper is very simple, for more unstructured problems, the partitioning scheme may need to be improved too, to ensure nearly equal number of points in each sub-graph on the same level. Otherwise, the construction process may be stopped quickly due to small sub-graphs on some level where many others are still very large, which is harmful to the efficiency of the derived multigrid method. It is a possible way to apply the schemes provided in the well-known packages such as Metis or Chaco. But it will be very hard to achieve satisfactory performance by simply applying these schemes. Though the graphs can be partitioned into subgraphs with nearly same number of vertexes as in this paper, there is another more important condition we should focus on. Each subgraph should be connected as far as possible, to ensure the applicability of the grid transfer operators. Otherwise, the overall performance of the multigrid must be degraded. It is also the main reason to use coordinates to partition graphs here. It will be one of our future works to apply this kind of sophisticated schemes in a suitable way. Further, for many temporally dependent problems, the coefficient matrix becomes more and more ill-conditioned with time-stepping, it is more difficult to solve. The extension of the algorithm to this kind of problems is also a focus of our future work. In addition, the parallelization is not considered in this paper yet and will be investigated in the future too. Acknowledgments. This work is funded by NSFC(61379022) and the authors thank the referees for their helpful suggestions. REFERENCES [1] R. Wienands, W. Joppich, Practical Fourier analysis for multigrid methods, Taylor and Francis Inc., 2004 [2] Y. Notay, Aggregation-based algebraic multilevel preconditioning, SIAM Journal On Matrix Analysis And Applications, 27:4(2006), pp.998-1018

16

JIAN-PING WU, FU-KANG YIN, JUN PENG , AND JIN-HUI YANG

[3] H. Kim, J. Xu, and L. Zikatanov, A multigrid method based on graph matching for convectiondiffusion equations, Numer. Linear Algebra Appl., 10(2003), 181-195 [4] P. D’Ambra, A. Buttari, D. di Serafino, S. Filippone, S. Gentile, B. Ucar, A novel aggregation method based on graph matching for algebraic multigrid preconditioning of sparse linear systems, in: International Conference on Preconditioning Techniques for Scientific & Industrial Applications 2011, May 2011, Bordeaux, France [5] Y. Notay, Aggregation-Based Algebraic Multigrid for Convection-Diffusion Equations, SIAM J. Sci. Comput., 34:4(2012), A2288-A2316 [6] Dendy, J. E., Jr., & Moulton, J. D.,Black Box Multigrid with coarsening by a factor of three, Numerical Linear Algebra With Applications, 17:2-3(2010), 577-598 [7] Vanek P., Mandel J., and Brezina M.,Algebraic multigrid by smoothed aggregation for second order and fourth order elliptic problems, Computing, 1996, 56:179-196 [8] P. Kumar, Aggregation based on graph matching and inexact coarse grid solve for algebraic two grid, International Journal of Computer Mathematics, 91:5(2014), 1061-1081 [9] Chen, M.-H., & Greenbaum, A.,Analysis of an aggregation-based algebraic two-grid method for a rotated anisotropic diffusion problem, Numerical Linear Algebra With Applications, 22:4(2015), 681-701 [10] D. Braess, Towards algebraic multigrid for elliptic problems of second order, Computing, 55(1995), 379-393 [11] N. Okamoto, K. Nakahashi, S. Obayashi, A coarse grid generation algorithm for agglomeration multigrid method on unstructured grids, In Proceedings of 36th Aerospace Sciences Meeting and Exhibit , volume 98-0615, AIAA, Reno, 1998 [12] Deng, L.-J., Huang, T.-Z., Zhao, X.-L., Zhao, L., & Wang, S., An economical aggregation algorithm for algebraic multigrid. (AMG)., Journal Of Computational Analysis And Applications, 16:1(2014), 181-198 [13] Wu Jian-ping, Yin Fu-kang, Peng Jun, Yang Jin-hui, Research on Aggregations for Algebraic Multigrid Preconditioning Methods, In: 2017 2nd International Conference on Computer Science and Technology [CST2017], Guilin, China, 2017 [14] Y. Saad, Iterative methods for sparse linear systems, Boston: PWS Pub. Co., 1996 [15] C. Wagner, Introduction to algebraic multigrid, Course Notes, University of Heidelberg, 1998/1999; available at: http://www.iwr.uni-heidelberg.de/ Christian.Wagner/