An extended basis inexact shift–invert Lanczos for the efficient solution of large-scale generalized eigenproblems

An extended basis inexact shift–invert Lanczos for the efficient solution of large-scale generalized eigenproblems

Computer Physics Communications 184 (2013) 2127–2135 Contents lists available at SciVerse ScienceDirect Computer Physics Communications journal home...

2MB Sizes 9 Downloads 91 Views

Computer Physics Communications 184 (2013) 2127–2135

Contents lists available at SciVerse ScienceDirect

Computer Physics Communications journal homepage: www.elsevier.com/locate/cpc

An extended basis inexact shift–invert Lanczos for the efficient solution of large-scale generalized eigenproblems M. Rewieński ∗ , A. Lamecki, M. Mrozowski Gdańsk University of Technology, ul. Narutowicza 11/12, 80-233 Gdańsk, Poland

article

info

Article history: Received 10 May 2012 Received in revised form 15 April 2013 Accepted 17 April 2013 Available online 24 April 2013 Keywords: Generalized eigenproblems Shift–invert Lanczos Multilevel preconditioner PCG solver Jacobi Orthogonal Component Correction

abstract This paper proposes a technique, based on the Inexact Shift–Invert Lanczos (ISIL) method with Inexact Jacobi Orthogonal Component Correction (IJOCC) refinement, and a preconditioned conjugate-gradient (PCG) linear solver with multilevel preconditioner, for finding several eigenvalues for generalized symmetric eigenproblems. Several eigenvalues are found by constructing (with the ISIL process) an extended projection basis. Presented results of numerical experiments confirm the technique can be effectively applied to challenging, large-scale problems characterized by very dense spectra, such as resonant cavities with spatial dimensions which are large with respect to wavelengths of the resonating electromagnetic fields. It is also shown that the proposed scheme based on inexact linear solves delivers superior performance, as compared to methods which rely on exact linear solves, indicating tremendous potential of the ‘inexact solve’ concept. Finally, the scheme which generates an extended projection basis is found to provide a cost-efficient alternative to classical deflation schemes when several eigenvalues are computed. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Symmetric generalized eigenvalue problems are encountered in many areas of physical modeling, including structural simulations, modeling of electromagnetic fields, hydrodynamics, and solid-state physics. They are typically characterized by large scale, with sparse matrix sizes reaching millions. Moreover, eigenvalues of interest are often located far away from spectrum edges, and the matrix operator may have high-order nullspace. For such large-scale eigenproblems, Krylov subspace-based methods such as Lanczos, Arnoldi algorithms (e.g. Implicitly Restarted Arnoldi Method (IRAM) [1,2]), Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method [3], or Jacobi–Davidson type techniques (e.g. JDQZ [4–7]), as well as their combinations (e.g. [8–10]) have become methods of choice over the last two decades thanks to their capacity and performance. Those methods have also proven extremely flexible, since the only required computation which involves the problem-specific matrix operator is the matrix–vector product, and the matrix operator can be represented in any form, admitting various efficient implicit representations. Yet, for generalized eigenproblems, and when eigenvalues of interest are located far away from either edge of the spectrum, the matrix–vector product becomes much more complex, involving



Corresponding author. Tel.: +48 583472917. E-mail addresses: [email protected] (M. Rewieński), [email protected] (A. Lamecki), [email protected] (M. Mrozowski). 0010-4655/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cpc.2013.04.006

an inversion of the matrix operator—most often a shift-and-invert operation, or nullspace filtering (if shift-and-invert is avoided, as in LOBPCG [3]), which in either case effectively requires solving a large-scale system of linear equations [10]. Hence, matvec operations have become major bottlenecks in terms of cost and complexity for the described class of problems. Difficulties mounted further, since in most earlier approaches high-accuracy solutions (i.e. high-accuracy ‘matvecs’) were required e.g. in order to maintain orthogonality of projection vectors in (exact) shiftand-invert Lanczos approach, thereby challenging iterative solvers for large, sparse linear systems (e.g. GMRES, CG) which often experienced problems with stagnation or nonconvergence. (Direct linear solvers are not applicable for large-scale problems, since they consume too much memory.) In order to address this issue, the concept of applying inexact linear solves instead of exact ones, has more recently been proposed by Golub et al. [11,12] in the context of shift– invert methods. (The idea of approximate solves may also be encountered in the Jacobi–Davidson method [7,13].) Following this concept new algorithms were presented, including inexact inverse iteration [11], inexact shift–invert Lanczos (ISIL), inexact shift–invert Arnoldi (ISIA), and Inexact Jacobi Orthogonal Component Correction (IJOCC) [12]. Furthermore, hybrid schemes based on ISIL followed by IJOCC have been proposed, which allow one to avoid early stagnation of inexact Krylov methods by performing efficient solution refinement with a Newton-like IJOCC algorithm. This paper further explores and expands the ISIL–IJOCC scheme, which so far saw relatively little application (cf. [12,14,15]), by

2128

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135

equipping it with a robust large-scale PCG solver with advanced multilevel preconditioner, and altering usage of the ISIL process in order to efficiently find multiple eigenvalues, without applying deflation. The rest of the paper is organized as follows: Section 2 provides background information on ISIL and IJOCC algorithms. Section 3 describes the proposed extensions to the ISIL–IJOCC scheme, including extended basis generation for finding multiple eigenvalues, and PCG solver with multilevel preconditioner. Section 4 presents the test example and results of numerical experiments, followed by conclusions in Section 5. 2. Background This paper focuses on solving large-scale symmetric generalized eigenvalue problems: Kx = λMx

(1)

where K and M are both positive semidefinite symmetric matrices, λ is an unknown eigenvalue and x is the corresponding unknown eigenvector. More specifically, the goal is to find a few (e.g. 10 or 20) eigenvalues (with corresponding eigenvectors) from a specific part of the spectrum, as defined by frequency shift σ . To this end, a Krylov subspace solver is used jointly with shift–invert scheme, as described below.

bring substantial savings during iterative linear solves (typically performed by Krylov subspace algorithms like Conjugate Gradient (CG) or MINRES), by lowering their iteration count. On the cost side, ISIL involves extra complete vector reorthogonalization, and explicit matrix projection, yet the cost of those operations is normally far outweighed by computational savings during the linear solve phase. More importantly, ISIL features early stagnation as compared to solvers based on exact solves, which often effectively prevents one from obtaining a solution with desired accuracy. In order to overcome this fundamental algorithmic limitation, ISIL should normally be complemented by a solution refinement phase once stagnation occurs. (ISIL stagnation can be detected by comparing residuals from subsequent iterations. Once the reduction of residuals for e.g. two subsequent iterations becomes much smaller as compared to previous iterations, ISIL should terminate. ISIL stagnation for different eigenpairs is visible in Fig. 5.) One efficient refinement strategy to complement ISIL is based on the Inexact Jacobi Orthogonal Complement Correction (IJOCC) algorithm (also proposed in [12,16]), described in the following section.

2.1. Inexact shift–invert Lanczos algorithm The Inexact Shift–Invert Lanczos (ISIL) method proposed by Golub, Sun et al. [12,16,17] is a Krylov subspace method based on Lanczos iteration. Algorithm 1 summarizes its steps. Firstly, an approximate shift–invert operation (with shift σ ) is performed in Step (4). Then, the three-term Lanczos recurrence is computed in Step (5). Three major differences between ISIL and the standard shift–invert Lanczos method should be noted: 1. Inexact linear solve is performed in Step (4), which enables substantial computational savings during iterative linear solves; 2. Complete reorthogonalization of the next approximate Lanczos vector with respect to the previous ones is done in Step (6), since any two approximate Lanczos vectors generated by ISIL are (by construction) non-orthogonal.1 3. Approximate Lanczos vectors are accumulated, and finally explicit projection of system matrices occurs in Step (8). This is done, because the tridiagonal matrix constructed on-the-fly during three-term Lanczos recurrence (Step (5)) is no longer the correct subspace projection, due to the approximate character of the Lanczos iteration in ISIL. The ISIL algorithm, which relies on approximate solves (with tolerance ϵ reaching 0.1, 0.01) stands in stark contrast with standard Krylov solvers such as the Implicitly Restarted Arnoldi Method [1] (implemented in the ARPACK library [2]), which require high-accuracy linear solves. If direct solvers are used, then solution accuracy reaches machine precision. If iterative solvers are used, then the accuracy of the linear solves must be commensurate with target accuracy for the computed eigenvalues and eigenvectors (see ARPACK User’s Guide [2]).2 Inexact solves

1 This non-orthogonality of vectors generated by the ISIL process should be distinguished from the loss of orthogonality occurring after a few iterations in the exact shift–invert Lanczos method implemented in finite-precision arithmetic, and described e.g. in [18]. 2 Iterative linear solves are done to high accuracy in e.g. the IRAM algorithm also in order to avoid additional error in the constructed Hessenberg matrix. The additional error is due to the applied implicit projection. See more details in Section 4.2.

2.2. Solution refinement Inexact Jacobi Orthogonal Component Correction (IJOCC) is a Newton-like method, which exhibits fast, superlinear convergence to the solution, given a good starting point. IJOCC is a simplification of the Jacobi–Davidson algorithm, since it does not require building Davidson subspace, and therefore saves memory and computational resources. Pseudo-code for JOCC is listed as Algorithm 2, in a version which, like ISIL, allows inexact linear solves. Even with very approximate solves IJOCC typically retains a superlinear convergence rate.

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135

2129

• The first one is to use an advanced multilevel preconditioner, instead of a simple SSOR approach, which provides robust convergence for the considered test problems. • The second one is to use the ISIL process to generate a sufficiently large basis, which will approximately span the portion of eigenspace of interest, compute the entire set of approximate eigenvectors in the ISIL-generated subspace, and only afterwards refine them one by one using the IJOCC algorithm. Details for the proposed approach are described in the following sections. 3.1. Multilevel preconditioning for FEM-based formulations

2.3. Finding several eigenvalues In most cases one is interested in computing more than one eigenvalue from a specified portion of the spectrum. Both ISIL and IJOCC, described in the previous sections, may be enhanced in order to allow computation of several eigenpairs. Typical approaches, proposed e.g. in [12], involve applying block versions of the two algorithms for finding a few tightly clustered eigenpairs, or using deflation techniques. ([12] presents two versions of deflation procedures: a simpler ‘orthogonalization’ deflation, and a more advanced ‘projection’ deflation.) The simple deflation approach makes the computed Lanczos vectors orthogonal to previously found eigenvectors. In a more advanced ‘projection’ scheme, previously found eigenvectors are deflated directly from the solution subspace during linear solves. In both approaches subsequent eigenvalues are computed by repeatedly calling ISIL (which deflates all previously found eigenvectors), followed by an IJOCC refinement step. This is briefly summarized in Algorithm 3. An important point to note is that in Step (3) ISIL, creates a Krylov subspace anew for each eigenvalue to be computed.

_

Due to considered problem sizes, approximate linear solves required by both the ISIL and IJOCC algorithms need to be performed by an iterative solver. The Preconditioned Conjugate Gradient (PCG) Krylov subspace iterative solver appears to be a suitable choice. It is a relatively simple method, with modest memory storage requirements, yet it has proven to be very effective and efficient [9,10,19–21] for solving large-scale problems, provided an appropriate preconditioning technique is applied. As already mentioned simple preconditioners cannot be used for the considered class of problems. Applying them typically results in a nonconverging or very slowly converging CG algorithm. However, for eigenproblem formulations based on the Finite Element Method (FEM) with higher-order basis functions, one may develop much more effective preconditioners exploiting the concept of multilevel preconditioning. Such advanced multilevel preconditioners which take advantage of the hierarchy of FEM basis functions have been proposed and successfully applied [21,22].

_

3. Efficient algorithm for finding several eigenvalues The inexact solve and refinement concept presented in ISIL and IJOCC is a very promising one. As observed in [12,15,17], it provides an attractive tradeoff between accuracy of the solves and the convergence rate, and reduces the total number of linear solves which typically dominate the overall computational cost. Nevertheless, issues were reported when using ISIL with IJOCC, such as poor convergence when solving eigenproblems for long accelerator structures [14]. Our experiments also showed nonconvergence of the original ISIL algorithm, when Symmetric Gauss–Seidel (Symmetric Successive Over-Relaxation (SSOR)) was applied as preconditioner for the iterative linear conjugategradient (CG) solver. Furthermore, as showed in the following sections, deflation schemes first proposed for ISIL (outlined in the previous section) did not prove very efficient for the considered computational problems. Consequently, this paper proposes the following algorithmic modifications to the original ISIL–IJOCC scheme which substantially improve its functionality and efficiency:

For the presented eigensolver, we propose to use a version of multilevel solver described in [21], with 3 levels of basis hierarchy. Suppose three hierarchy levels were used during FEM discretization. Then, for the shift–invert problem, the matrix of interest becomes: A00 A10 A20

 (K − σ M) = A =

A01 A11 A21

A02 A12 A22



where Ajk blocks correspond to different orders (or hierarchy levels) of FEM basis functions, and σ is a given shift. The multilevel preconditioning operation is shown in Algorithm 4 (the algorithm is initially called with i = 2, i.e. starts at the highest hierarchy level). The smoothing operation consists of one or more weighted Jacobi or Gauss–Seidel relaxation steps, performed using diagonal blocks Aii at different hierarchy levels. In all the tests described below, a weighted Jacobi smoother was used. (A Gauss–Seidel smoother was found to provide similar performance.)

2130

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135

3.2. Finding several eigenvalues through extended approximate eigenbasis generation using inexact shift–invert Lanczos In the basic ISIL algorithm described in Section 2.1, an orthogonal basis in approximate Lanczos subspace is constructed. This basis can be effectively used to find more than one approximate eigenpair from the desired part of the spectrum, provided its size is sufficiently large. Typically, if Nev eigenvalues are to be found, a basis of 2Nev approximate ISIL-generated Lanczos vectors is large enough to compute good eigenvalue approximations. This is illustrated in Fig. 1, which shows the convergence profile for the 9 largest eigenvalues for the projected eigenproblem (cf. Step (10) of Algorithm 1). Note that the size of the projected matrix increases from 1 to 20, hence e.g. approximation of the 9th eigenvalue can only be computed starting with the 9th iteration. One may note that by iteration 18, convergence is reached for all 9 approximate eigenpairs. Consequently, we propose the following scheme for finding multiple eigenpairs of interest: First, we use ISIL to compute the projection basis of sufficient size (e.g. 2Nev ). The projection basis is simply the matrix Z from Step (9) of Algorithm 1. Subsequently, we project the system matrices, and find Nev eigenvalues closest to the predefined shift σ with the corresponding eigenvectors, for the projected standard eigenvalue problem. Finally, in order to enhance the quality of the eigenpairs computed with this ISIL-generated ‘extended’ approximate Lanczos basis, the IJOCC algorithm is applied for each eigenpair approximation. The entire scheme is summarized in Algorithm 5. One should stress the key differences between the proposed method and Algorithm 3:

Fig. 1. Convergence profile for 9 computed eigenvalues vs. the size of the basis generated with ISIL (Algorithm 1). The last iteration is done using IJOCC (Algorithm 2).

• Firstly, the proposed method does not apply deflation during ISIL iteration, which implies lower cost of each iteration.

Fig. 2. Modeled resonant cavity, used in high-frequency microwave filter designs.

• Secondly, IJOCC refinements are performed only after all eigenpair approximations are found, whereas in the deflation scheme IJOCC is called after each eigenpair approximation is computed by ISIL. As shown in the following section the proposed method brings both accuracy and performance gains. Better performance is achieved partly because an appropriate Krylov subspace does not have to be constructed anew for each eigenvalue.

is satisfied for each j = 1, . . . , Nev , where i is the iteration number and tol is the target tolerance. This could be done at a relatively modest additional cost of computing (updating) the projected matrix Kr , and solving a small-size eigenproblem. Although it has not been observed in the considered application examples, the simple criterion (2) may occasionally lead to misconvergence if eigenvalues are very tightly clustered. (For an in-depth discussion of the problem, and possible remedies see [23].) 4. Numerical results

_

_

3.3. Extended approximate Lanczos space size selection The extended Lanczos basis generation algorithm in its simplest form presented in Algorithm 5 uses a fixed space size of 2Nev , where Nev is the number of eigenvalues to be computed. While, as shown below, this is a good rule of thumb, a more automatic selection is possible, by detecting convergence of eigenvalues for the projected problem (Kr , Mr ) (cf. Step (2) of Algorithm 5) while the basis is generated. To this end, Steps (2) and (3) would need to be performed within the ISIL algorithm (Step (1)), for iterations i ≥ Nev . Basis generation could be terminated when condition

|λji−1 − λji |/λji < tol

(2)

This section presents results of computational experiments which verify the accuracy of the proposed method, and compare its performance to other existing approaches, for a challenging largescale eigenvalue problem. The test problem selected to evaluate the proposed algorithm concerns modeling electromagnetic field distribution in a resonant cavity (shown in Fig. 2) encountered in modern high-frequency filter designs. More specifically, one is interested in finding resonant frequencies for the structure, along with the corresponding modes (distributions) of E or H fields, in a certain frequency range. This can be achieved by solving an eigenvalue problem for an elliptic differential operator derived from Maxwell’s equations:

  ∇ × µ−1 ∇ × E = ω2 ϵ E in Ω nˆ × E = 0

on ∂ Ω

(3) (4)

where µ is permeability, ϵ is electric permittivity, E is the electric field (the unknown eigenvector), ω2 is the squared resonant frequency (the unknown eigenvalue), Ω is the problem domain, described by the geometry of the cavity, and ∂ Ω is its boundary.

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135

2131

Fig. 3. Sparsity patterns for K (left) and M (right) matrices.

Transforming the above problem to its weak (variational) form and applying the Finite Element Method (FEM) yields the following discrete generalized eigenproblem: Kx = ω2 Mx

(5)

where both K, M are symmetric positive semidefinite matrices. There are multiple challenges associated with solving the above eigenproblem. Firstly, since spatial dimensions of the cavity are large with respect to wavelengths of the resonating electromagnetic field, FEM discretization using high-order elements of suitable accuracy typically yields sparse matrices of order 105 –107 . The test matrices had sizes ranging from 806 811 to 11 014 074. The sparsity pattern for the matrix of size N = 806 811 is shown in Fig. 3. For this matrix, due to the applied third-order FEM basis functions the average number of nonzero elements per row reaches 70 (with a maximum of 288), which is a relatively large number of couplings, significantly increasing the complexity of the problem. (Third order functions also couple much more uniformly to lower-order functions, which appear as ‘‘dense’’ blocks on the spy plot, in the lower and right hand side portions of the matrices.) Secondly, the spectrum distribution involves tightly clustered eigenvalues spread over a very wide range. It also includes a large number of zero eigenvalues. The corresponding nullspace of the initial operator has size ranging from 175 000 to over 2.6 million (for the 11-million problem). Finally, one is typically interested in finding the first few nonzero eigenvalues, which are in the interior of the spectrum, i.e. on the one hand they are bounded by a large number of zero eigenvalues corresponding to the operator nullspace, and on the other—by most of the nonzero eigenvalues. Consequently, available eigensolvers are either ineffective or inefficient for the presented large-scale problem. Classical eigensolvers cannot be applied, since they use huge amounts of computer memory (as they need to store relatively dense matrix factors). Finding eigenvalues located far away from the edges of the spectrum poses a fundamental problem for basic Krylov subspace algorithms which do not use the shift–invert strategy. Methods such as Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) are not immediately applicable, due to a largesize nullspace. (The nullspace needs to be filtered out which may prove not very scalable—cf. Section 4.3 for details.) Finally, Krylov subspace methods employing shift–invert with exact linear solves, such as Lanczos or Arnoldi (e.g. IRAM) methods, as well as the Jacobi–Davidson method prove inefficient (cf. Section 4.3).

Table 1 Accuracy of the eigenpairs computed with ISIL–IJOCC and with ISIL alone, for linear solver tolerance set to 1e−1. Relative eigenvalue error is computed as |λ − λref |/|λref |. Relative residual is computed as ∥Kx − λMx∥2 /|λ|. Reference eigenvalues (λref ) were computed using ARPACK with PCG solver, with linear solver accuracy set to 1e−8. Ev no.

1 2 3 4 5 6 7 8 9

ISIL–IJOCC

ISIL

Relative eval. err.

Relative residual

Relative eval. err.

Relative residual

1.643e−9 0.151e−9 3.327e−9 0.598e−9 0.583e−9 0.238e−9 1.224e−9 2.128e−9 1.015e−9

1.964e−4 2.640e−4 4.036e−4 6.386e−4 6.913e−4 3.394e−4 7.684e−4 7.092e−4 8.007e−4

0.111e−6 0.091e−6 0.013e−6 0.516e−6 0.556e−6 0.579e−6 0.982e−6 0.668e−6 1.130e−6

0.46e−2 0.56e−2 0.58e−2 0.78e−2 0.90e−2 0.90e−2 1.23e−2 1.07e−2 1.20e−2

4.1. Accuracy validation for ISIL–IJOCC method The algorithm based on ISIL and IJOCC on the other hand appears to be an attractive solution for the discussed computational problem. Its prototype implementation has been developed using Matlab. Firstly, accuracy validation has been performed. In the experiment a few (10) eigenvalues were to be computed around target frequency f = 0.9 GHz (with the corresponding input shift σ = 31.977). Table 1 shows accuracy of nine computed nonzero eigenvalues (data for the 10th zero eigenvalue was omitted for clarity). The reference eigenvalues were computed using ARPACK (specifically, the eigs function in Matlab) combined with an iterative PCG linear solver with multilevel preconditioner (with solve accuracy set to 1e−8). The ISIL–IJOCC solver used the same PCG linear solver, however with much relaxed solver accuracy set to 1e−1. The table also shows the impact of IJOCC refinement. For each eigenvalue only a single IJOCC refinement iteration was done, which allowed one to improve eigenvalue quality by 2–3 orders of magnitude, and reduce the relative residual by over 1 order of magnitude. 4.2. Impact of explicit projection In order to validate the ISIL algorithm, the impact of explicit (rather than implicit) matrix projection has also been investigated. In the explicit matrix projection, Kr = QT KQ, where Q is the ISILgenerated projection basis (cf. Algorithms 1 and 5). In the classical

2132

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135

Fig. 4. Convergence profile for the 1st eigenvalue computed using ISIL and ‘implicit ISIL’, which does not perform explicit matrix projection, and instead uses 3-diagonal matrix generated during inexact Lanczos process to find eigenvalues.

Table 2 Accuracy comparison between ISIL (linear solver tolerance: 1e−1) vs. inexactARPACK (linear solve tolerance: 1e−1). Relative eigenvalue error is computed as |λ − λref |/|λref |. Relative residual is computed as ∥Kx − λMx∥2 /|λ|. Ev no.

1 2 3 4 5 6 7 8 9

ISIL

Inexact ARPACK

Relative eval. err.

Relative residual

Relative eval. err.

Relative residual

0.111e−6 0.091e−6 0.013e−6 0.516e−6 0.556e−6 0.579e−6 0.982e−6 0.668e−6 1.130e−6

0.46e−2 0.56e−2 0.58e−2 0.78e−2 0.90e−2 0.90e−2 1.23e−2 1.07e−2 1.20e−2

1.554e−5 2.420e−5 0.211e−5 1.588e−5 0.428e−5 2.913e−5 2.106e−5 0.453e−5 2.318e−5

0.57e−2 0.56e−2 0.71e−2 1.38e−2 1.22e−2 1.03e−2 1.53e−2 2.67e−2 1.28e−2

Lanczos method the projection is done implicitly by constructing a ˆ r: tridiagonal matrix K

α 0  β ˆr =  K  0 

β0 α1 .. .

 .. ..

.

. βn−2

βn−2 αn−1

   

where coefficients αi and βi are computed during ISIL iterations ˆ r corresponds to (cf. Algorithm 1). Note: eigenvalue µi for K eigenvalue λi = 1/µi + σ for the initial eigenproblem, where σ is the specified shift. Computation of eigenvalues using the above 3-diagonal matrix has also been implemented. Fig. 4 shows a comparison of convergence profiles for a sample (1st) eigenvalue ˆ r (implicit ISIL), confirming computed using Kr (explicit ISIL) and K superior accuracy of the explicit projection when a Krylov subspace is constructed with inexact linear solves. For further comparison, and to provide an additional rationale for the explicit projection, we implemented a method named ‘inexact-ARPACK,’ which uses inexact linear PCG solves (as compared to ‘exact’ solves done in the standard IRAM algorithm). Table 2 shows a comparison of ISIL with this method. InexactARPACK and ISIL are very similar in terms of the basis generated, because both algorithms build approximate Krylov subspaces in the same manner, and perform orthogonalization of basis vectors. Yet, ARPACK performs implicit matrix projection, i.e. constructs a low-order Hessenberg matrix (tridiagonal matrix in the symmetric case), and uses it to compute the eigenpairs. On the contrary, as already mentioned above, ISIL stores the orthogonal Krylov basis,

Fig. 5. Residue profiles for 9 computed eigenpairs vs. the size of the basis generated with ISIL (Algorithm 1). The last iteration is done using IJOCC (Algorithm 2).

Table 3 ISIL–IJOCC runtimes for different problem sizes. The tests were performed on a linux server with 2 AMD Opteron 6174 12-core 2.2 GHz processors, and 64 GB of RAM. Matrix size

Total runtime (s)

Total PCG solve time (s)

Total ISIL time (s)

Total # No. mat–vecs ISIL iter.

No. IJOCC iter.

806 811 5 296 284 11 014 074

2 738 18 475 46 474

2 090 13 090 33 300

1 516 9 822 22 234

786 620 653

14 14 14

19 19 19

and performs explicit matrix projection. (The rationale is that a Hessenberg matrix is not an exact projection of the initial matrix onto the constructed subspace, when only inexact linear solves are done.) Results shown in Table 2 confirm that explicit matrix projection done by ISIL improves accuracy of the solutions. 4.3. Convergence, performance and scalability for ISIL–IJOCC Convergence, scalability and performance of ISIL–IJOCC with extended basis generation have also been investigated. Figs. 1 and 5 show, respectively, convergence profiles for the nine computed eigenvalues, and the corresponding relative residuals. One should note that approximation for e.g. the 5th eigenvalue is only available beginning with the 5th iteration, since it requires a Krylov subspace of order ≥5. In the experiment the order of approximate Krylov subspace constructed by ISIL was fixed at 20. Fig. 1 shows a fairly stable, typical eigenvalue convergence profile. One may also note stagnation of ISIL occurring around iteration 17. Finally, the graph shows a consistent accuracy improvement due to IJOCC refinement, occurring at iteration 21. A single JOCC step was enough to reach the desired level of accuracy. One should also stress the importance of the applied linear solver (PCG), and particularly of the multilevel preconditioner, which made it possible to obtain convergence of the inexact linear solves, as well as good performance (with the number of PCG iterations (= number of ISIL matvecs) ranging from 15 to 33 (for ISIL), and from 7 to 33 (for IJOCC) for each inexact linear solve with tolerance 1e−1). Attempts to use simpler preconditioning schemes, like symmetric successive over-relaxation (symmetric Gauss–Seidel) resulted in nonconverging inexact linear solves for the considered computational example. On the other hand, using a multilevel preconditioner with other Krylov solvers, e.g. MINRES, resulted in costlier, less efficient solves. Overall, the conjugategradient solver with multilevel preconditioner appears to be the method of choice for the considered problem.

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135

2133

Table 4 Performance comparison between ISIL–IJOCC, ARPACK, inexact ARPACK, JDQZ and LOBPCG. N = 806 811. The tests were performed on a workstation with Intel i7 870, 2.93 GHz processor. The third column shows the approximate averaged relative errors for the 9 computed eigenvalues.

ISIL–IJOCC ISIL Exact ARPACK Inexact-ARPACK Inexact-ARPACK JDQZ + GMRES JDQZ + BICGSTAB LOBPCG LOBPCG + Nullspace filter

Lin. solver tol.

Relative e.value error

Total runtime (s)

Total PCG solve time (s)

Total # mat–vecs

Total # precond. setups

1e−1 1e−1 1e−8 1e−1 1e−2 Varies 1e−8 –

1e−9 1e−6 Ref. 1e−5 1e−6 1e−9 1e−9 Failed

1068 581 6316 3652 7710 16 581 7135 Failed

790 440 4009 2111 4670 – – Failed

808 451 2954 1624 3550 4991 4145 Failed

15 1 1 1 1 1 1 Failed



1e−9

435



284

1

Fig. 6. Runtimes for ISIL and ISIL–IJOCC for different problem sizes.

Scalability tests have also been performed for the proposed ISIL–IJOCC method by applying progressively finer FE discretization for the considered test problem. Table 3 shows runtimes, as well as matvec and iteration counts for 3 different problem sizes. Runtimes are also shown on a plot in Fig. 6. The results suggest very good scalability of the proposed algorithm to large-scale problems. The stable number of matvecs and of ISIL and JOCC iterations confirm approximately equal convergence rate for all three problem sizes. The largest problem tested had approx. 11 million unknowns due to memory usage constraints for the current Matlab implementation, and the available hardware. Table 4 compares performance, in terms of CPU time and the number of matrix–vector products, between the proposed ISIL–IJOCC method and both exact and inexact ARPACK algorithms, the Jacobi–Davidson (JDQZ) method and the LOBPCG algorithm. In this test the same 9 eigenvalues were computed using each method. The ISIL, IJOCC and ARPACK methods used the same implementation of matvec operation, which involved an iterative linear solve using the preconditioned CG method with multilevel preconditioner. JDQZ used the built-in GMRES solver with variable linear solver tolerance starting with 0.7. Both JDQZ and LOBPCG used the same multilevel preconditioner as ISIL. The tolerances for linear iterative solvers are listed in the 2nd column of the table. The ‘exact-ARPACK’ run with tolerance set to 1e−8 provided the reference result. The tolerances for JDQZ and LOBPCG were set to 1e−5 and 3e−2, respectively (not to be confused with linear solver tolerance), in order to match the accuracy of eigenvalues computed with ISIL–IJOCC. The third column of Table 4 shows the approximate order of relative error for the nine computed eigenvalues versus the reference result. Significant speedups versus both exact- and inexact-ARPACK (e.g. 6.0X vs. exact-ARPACK) are apparent for ISIL–IJOCC, and

confirmed by the reduced number of required matvec operations which dominate the computational cost. It is interesting to note that ISIL–IJOCC has smaller overhead (on top of the matvec cost) as compared to ARPACK, despite ISIL’s Matlab implementation which involves a bigger portion of interpreted execution. (The Matlab ‘eigs’ function used in the tests calls compiled and optimized ARPACK library subroutines.) The table also shows that the Jacobi–Davidson algorithm (JDQZ Matlab implementation by Sleijpen et al. [13] with either the GMRES or BiCGstab built-in linear solver, and the multilevel preconditioner described in Section 3.1) is not competitive for the considered test problem. Neither high-accuracy solves done with BiCGstab, nor variable solve tolerance used with GMRES (solver tolerance is tightened for subsequent Jacobi–Davidson iterations) results in fast convergence, as confirmed by the large overall number of matvecs. As for LOBPCG (the Matlab implementation by Knyazev [24] has been used), the basic method is not suitable for the modeled problem, since the large nullspace of the matrix operator causes LOBPCG to converge only to zero eigenvalues. In order for LOBPCG to be able to find the desired eigenvalues one needs to filter out the matrix nullspace by independently computing eigenvectors corresponding to zero eigenvalues, which in general may prove to be a very hard task. For the considered electromagnetic problem, we were able to implement specialized code, valid for the applied selection of 3rd order FEM basis functions, which extracts all the eigenvectors spanning the nullspace of the matrix and feeds them to LOBPCG (see [10]). Based on the computed eigenvectors LOBPCG filters out the nullspace, which for the electromagnetic problem means in fact enforcing the divergence-free condition for subsequent iterates [10] (which is equivalent to solving a linear system involving a discrete Laplacian, i.e. solving an associated electrostatic problem). For the problem of size N = 806 811 the associated discrete Laplacian has size of 175 000, and consequently the associated ‘electrostatic problem’ can be solved inexpensively using a direct linear solver (as it is done in the LOBPCG implementation used in the test). Hence, for this problem size LOBPCG provides a very competitive solution, requiring the fewest matvecs (see Table 4). Yet, for the problem with 11 million unknowns the available implementation of LOBPCG was inadequate, since filtering a nullspace of size 2.6 million proved impossible for the embedded direct solver, due to enormous memory usage. In order to overcome this fundamental problem one needs to implement a specialized preconditioned iterative FEM solver (see e.g. [10]) for the electrostatic problem. Consequently an ‘inner-loop’ iteration appears in the LOBPCG eigensolver. The inner-loop solves are done to high accuracy [10], which may result in many additional iterations, and in a loss of scalability of the algorithm. This hidden cost of present in ‘solve-free’ methods (which avoid shift–invert operations) cannot be neglected for many large-scale eigenproblems, including

2134

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135

Table 5 Accuracy and performance comparison: ISIL–IJOCC with deflation vs. ISIL–IJOCC with extended basis generation. Columns 4 and 5 only report the total number of mat–vecs and iterations, since in the ‘extended basis’ approach all the eigenvalues are computed ‘in block,’ unlike in the ‘deflation’ approach where the eigenvalues are computed one by one. Ev. no.

ISIL–IJOCC+ extended basis

ISIL–IJOCC+ deflation

Relative eval. err

Relative residual

# Mv

# iter.

Relative eval. err

Relative residual

1 2 3 4 5 6 7 8 9

1.643e−9 0.151e−9 3.327e−9 0.598e−9 0.583e−9 0.238e−9 1.224e−9 2.128e−9 1.015e−9

1.964e−4 2.640e−4 4.036e−4 6.386e−4 6.913e−4 3.394e−4 7.684e−4 7.092e−4 8.007e−4

– – – – – – – – –

– – – – – – – – –

1.102e−7 5.977e−7 9.235e−8 −1.854e−5 6.059e−7 1.072e−7 7.536e−7 3.547e−7 1.887e−6

4.584e−3 8.519e−3 6.252e−3 8.762e−3 9.134e−3 8.369e−3 12.007e−3 9.327e−3 13.555e−3

Tot.





808

29





those encountered in electromagnetics. Similar considerations as for LOBPCG apply to the JDCG method [8] (applying basic implementation [25] yields nonconvergence for the considered test problem), or the JDSYM algorithm (cf. [10]). 4.4. Comparison between extended basis generation approach and deflation scheme In the final series of tests accuracy and performance between two ISIL-based techniques for finding several eigenvalues were compared. The approach using deflation, described in [12], is compared against the method based on extended basis generation, proposed in this paper. Table 5 shows the accuracy of the computed eigenvalues and residuals for the two techniques. (In both cases the same PCG solver was used, and identical linear solve tolerance and stopping criteria were applied.) In the test, the extended ISILgenerated basis had order 20. For the considered test example, ISIL–IJOCC with extended basis generation yields higher accuracy results. Despite the fact that subsequent eigenvalues can be found with less computational effort for the deflation scheme (cf. the number of matvecs and iterations in Table 5), the overall performance of ISIL–IJOCC with extended basis is still significantly better for the specified number of eigenvalues to be found. Table 6 shows almost 3× speedup achieved by the proposed algorithm, confirmed by the total number of matvecs and iterations shown in Table 5. 5. Conclusion This paper proposed a technique for finding several eigenvalues for generalized symmetric eigenproblems based on the Inexact Shift–Invert Lanczos (ISIL) method with Inexact-JOCC refinement, and the PCG linear solver with multilevel preconditioner. As confirmed by the results of numerical experiments, the technique can be effectively applied to challenging, large-scale problems, in particular to modeling resonant cavities with spatial dimensions which are large with respect to the wavelength of the resonating electromagnetic field, characterized by very dense spectra. The results also show the proposed scheme based on inexact linear solves delivers superior performance, as compared to methods which rely on exact linear solves during shift–invert phase, such as the classical IRAM algorithm. It also appears competitive against methods which perform progressively more accurate iterative solves, such as the tested JDQZ method (with GMRES linear solver and multilevel preconditioner), as indicated by the lower number of matvecs. Furthermore, the inexact shift–invert strategy seemingly provides a more scalable alternative to methods which avoid shift–invert solves, yet rely on nullspace filtering, such as LOBPCG. While LOBPCG proves very competitive for smaller

# Mv

# iter.

235 315 290 252 258 224 193 159 135

10 12 11 9 10 9 8 7 6

2061

82

Table 6 Performance comparison between ISIL–IJOCC (with deflation) and ISIL–IJOCC (with extended basis generation). Linear solver accuracy was 1e−1. The tests were performed on a workstation with Intel i7 870, 2.93 GHz processor.

ISIL–IJOCC (deflation) ISIL–IJOCC (ext. basis)

Total runtime (s)

Total PCG solve time (s)

Total # matvecs

Total # preconditioner setups

3027 1068

2530 790

2634 808

15 15

problems, the hidden memory and computational cost associated with filtering out the nullspace of the matrix operator can become substantial for larger problems. Finally, performance of both ISIL and IJOCC suggest the tremendous potential of the inexact solve concept, which deserves further exploration. The paper also proposes a method for finding several eigenvalues based on generating an extended projection basis, which for the studied test case provides a cost-efficient alternative to more conventional schemes based on deflation. Finally, an iterative PCG solver with multilevel preconditioning method is shown to provide robustness, performance, and scalability for the considered problem. The technique described in the paper may be extended and explored in various directions. It may be readily enhanced to handle generalized problems with complex matrices. Extensions are also possible to nonsymmetric problems by replacing ISIL with the Inexact Shift–Invert Arnoldi (ISIA) [12] method. Block ISIL and block ISIA schemes may also be considered for problems with even more tightly clustered eigenvalues. Finally, efficient implementation of the proposed method (particularly of mat–vec computations and portions of the preconditioning operation during iterative linear solves) using general purpose graphics processing units (GPUs) can be explored [19–21]. Acknowledgments This work was supported by the Polish National Centre for Research and Development under Agreement LIDER/21/148/L1/09/NCBiR/2010. The authors wish to thank the anonymous reviewers for their insightful comments, which were vital to improving this paper. References [1] D. Sorensen, in: D.E. Keyes, A. Sameh, V. Venkatakrishnan (Eds.), Parallel Numerical Algorithms, Kluwer, Dordrecht, 1997, p. 119. [2] R. Lehoucq, D. Sorensen, C.-C. Yang, ARPACK user’s guide: solution of large scale eigenvalue problems with implicitly restarted Arnoldi methods. URL: http://www.caam.rice.edu/software/ARPACK/UG. [3] A. Knyazev, SIAM J. Sci. Comput. 23 (2) (2001) 517. [4] G. Sleijpen, H. van der Vorst, SIAM J. Matrix Anal. Appl. 17 (1996) 401.

M. Rewieński et al. / Computer Physics Communications 184 (2013) 2127–2135 [5] G.L.G. Sleijpen, H.A. van der Vorst, in: S.D. Margenov, P.S. Vassilevski (Eds.), Iterative Methods in Linear Algebra, II, vol. 3, 1996, p. 377. [6] D. Fokkema, G. Sleijpen, H. van der Vorst, SIAM J. Sci. Comput. 20 (1) (1998) 94. [7] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, H. van der Vorst (Eds.), Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, 2000. [8] Y. Notay, Numer. Linear Algebra 9 (2002) 21. [9] A. Stathopoulos, J. McCombs, ACM Trans. Math. Softw. 37 (2) (2010) 21:1. [10] P. Arbenz, R. Geus, Appl. Numer. Math. 54 (2005) 107. [11] G. Golub, Q. Ye, Inexact inverse iterations for the generalized eigenvalue problems, SCCM-99-02 Report, Stanford University. [12] Y. Sun, The filter algorithm for solving large-scale eigenproblems from accelerator simulations, Ph.D. Thesis, Stanford University, 2003. [13] D. Fokkema, G. Sleijpen, H. van der Vorst, jdqz.m. URL: http://www.staff.science.uu.nl/~sleij101/index.html. [14] J. DeFord, B. Held, Proceedings of the Particle Accelerator Conference, PAC, vol. 5, 2003, p. 3554. [15] A. Cwikla, M. Mrozowski, M. Rewieński, IEEE Trans. Microw. Theory 51 (5) (2003) 1506.

2135

[16] G. Golub, B. McCandless, Y. Sun, Solving large generalized eigenproblems through filtering, 2000 (unpublished). [17] Y. Sun, N. Folwell, Z. Li, High precision accelerator cavity design using the parallel eigensolver Omega3P, in: Proceedings of the Annual Review of Progress in Applied Computational Electromagnetics, 2002. [18] B. Parlett, The Symmetric Eigenvalue Problem, SIAM, Philadelphia, 1987. [19] A. Dziekonski, A. Lamecki, M. Mrozowski, Prog. Electromagn. Res. 116 (2011) 49. [20] A. Dziekonski, A. Lamecki, M. Mrozowski, IEEE Antennas Wirel. Propag. 10 (2011) 619. [21] A. Dziekonski, A. Lamecki, M. Mrozowski, IEEE Microw. Wirel. Compon. 21 (1) (2011) 1. [22] Y. Zhu, A. Cangellaris, Multigrid Finite Element Methods for Electromagnetic Field Modeling, Wiley-Interscience, Hoboken, NJ, 2006. [23] B. Parlett, Misconvergence in the Lanczos algorithm, Tech. Rep., Center for Pure and Applied Mathematics (CPAM), 1987. [24] A. Knyazev, lobpcg.m. URL: http://www.mathworks.com/matlabcentral/ fileexchange/48-lobpcg-m. [25] Y. Notay, jdqc_gep.m. URL: http://mntek3.ulb.ac.be/pub/docs/jdcg/jdcg_gep. m.