Acceleration of inverse subspace iteration with Newton’s method

Acceleration of inverse subspace iteration with Newton’s method

Journal of Computational and Applied Mathematics ( ) – Contents lists available at SciVerse ScienceDirect Journal of Computational and Applied Mat...

442KB Sizes 2 Downloads 142 Views

Journal of Computational and Applied Mathematics (

)



Contents lists available at SciVerse ScienceDirect

Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam

Acceleration of inverse subspace iteration with Newton’s method G. El Khoury a,∗ , Yu.M. Nechepurenko b , M. Sadkane a a

Université de Brest, CNRS - UMR 6205, Laboratoire de Mathématiques de Bretagne Atlantique. 6 avenue Victor Le Gorgeu, CS 93837, 29285 Brest Cedex 3, France b Institute of Numerical Mathematics, Russian Academy of Sciences, ul. Gubkina 8, Moscow, 119333, Russia

article

info

Article history: Received 26 October 2012 Received in revised form 3 June 2013 Keywords: Inverse subspace iteration Newton’s method Eigenvalue Invariant subspace Sylvester equation Preconditioning

abstract This work is focused on the computation of the invariant subspace associated with a separated group of eigenvalues near a specified shift of a large sparse matrix. First, we consider the inverse subspace iteration with the preconditioned GMRES method. It guarantees a convergence to the desired invariant subspace but the rate of convergence is at best linear. We propose to use it as a preprocessing for a Newton scheme which necessitates, at each iteration, the solution of a Sylvester type equation for which an iterative algorithm based on the preconditioned GMRES method is specially devised. This combination results in a fast and reliable method. We discuss the implementation aspects and propose a theory of convergence. Numerical tests are given to illustrate our approach. © 2013 Elsevier B.V. All rights reserved.

1. Introduction This paper combines the use of inverse subspace iteration and Newton’s method to compute the invariant subspace associated with a separated group of p eigenvalues near a given shift σ , of a large sparse matrix A. To simplify the notation, we denote by A the shifted matrix A − σ I. (Here and throughout the paper, I denotes the identity matrix of appropriate order.) Starting with an n × p matrix X0 whose columns are orthonormal, the inverse subspace iteration constructs a sequence X1 , X2 , . . . , by solving at step k, the system AXk+1 = Xk for Xk+1 with the preconditioned GMRES method and setting Xk+1 := ort(Xk+1 ), where ort(X ) denotes a result of any column orthonormalization procedure applied to a rectangular matrix X . The columns of Xk span an approximation of the desired invariant subspace. This method is a simple example of an eigensolver with inner–outer iteration. That is, an eigensolver that requires, at each (outer) iteration, an iterative method (inner iteration) to solve linear systems. Actually solving the above systems exactly guarantees a linear convergence to the desired invariant subspace. However, this necessitates a sparse direct factorization of A, which may not be feasible in many applications. Therefore, it seems reasonable to use an iterative method to solve the systems inexactly. The main concerns are then: (i) the level of accuracy required for the inner iterations to preserve convergence, and (ii) efficient preconditioning of the inner iterations. Answers to these concerns, in the case of inverse (subspace) iteration can be found, for example, in [1–5]. The ideas and main conclusions in these references are that linear convergence can be preserved by solving the inner iteration with increasingly higher accuracy. The drawback is that the number of matrix–vector multiplications in the inner iteration may increase as the outer iteration proceeds. An efficient way to stop the increase is based on the concept of tuning, introduced first in [1] in the context of inverse iteration. In this reference, the authors propose a tuned preconditioner which is a rank-one modification of a standard preconditioner whose action on the current



Corresponding author. Tel.: +33 298017962. E-mail addresses: [email protected] (G. El Khoury), [email protected] (Yu.M. Nechepurenko), [email protected] (M. Sadkane).

0377-0427/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cam.2013.06.046

2

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



approximate eigenvector agrees with that of the system matrix. Its use results in a large saving in the total number of inner iterations. A block version of the tuned preconditioner was analyzed theoretically and numerically in [3] for inverse subspace iteration. To accelerate the linear convergence, we propose to use the approximation obtained with a few inverse subspace iterations as an initial guess for a variant of the Newton method, which is a modification of the Newton scheme proposed in [6]. The modification thus obtained is another example of eigensolvers with inner–outer iteration. It necessitates, at each outer iteration, the solution of a Sylvester type equation for which an iterative algorithm based on the preconditioned GMRES method is specially devised. This combination results in a reliable method with a quadratic rate of convergence. The idea of combining inverse subspace iteration with other eigensolvers is not new, it has been used in [7,8] as a postprocessing step for Arnoldi’s method to stabilize computations of eigenvalues. In the present paper, we use it as a preprocessing stage to secure the local quadratic convergence of Newton’s method. The paper is organized as follows. In Section 2, we briefly recall the inverse subspace iteration with standard and tuned preconditioners for inner iterations. In Section 3 and Appendix A, we consider the Newton method and discuss its convergence properties. In particular, we prove the quadratic convergence in a more general context. In Section 4 and Appendix B we explain how to solve the Sylvester equations arising in the Newton iterations. Section 5 presents an algorithm which combines inverse subspace iteration and Newton’s method. Section 6 illustrates the proposed approach on matrices arising from a discretization of a 2D elliptic operator. A conclusion is given in Section 7. 2. Inverse subspace iteration Consider the following version of inverse subspace iteration where X0∗ denotes the conjugate transpose of X0 : Algorithm 1 (Inexact Inverse Subspace Iteration). Given ε > 0, δ1 > 0, δ2 > 0 and X0 ∈ Cn×p with X0∗ X0 = I. For k = 0, 1, . . . 1. Compute Ak = Xk∗ AXk and Rk = AXk − Xk Ak . 2. Test the convergence : if ∥Rk ∥2 ≤ ε then Xout = Xk and exit. 3. Solve for Xk+1 the system AXk+1 = Xk inexactly such that

∥Xk − AXk+1 ∥2 ≤ ρk = min{δ1 , δ2 ∥Rk ∥2 }.

(1)

4. Set Xk+1 := ort(Xk+1 ). End The block linear systems to be solved are given in step 3. The matrix Xk+1 in this step can be obtained, for example, columnby-column by applying GMRES so that (1) is satisfied [9]. However, to accelerate the computations, preconditioned GMRES should be used. Then, Xk+1 is obtained by applying GMRES to the right preconditioned block system so that

  Xk − AP −1 Zk  ≤ ρk , k 2

Xk+1 = Pk−1 Zk ,

(2)

where Pk is an approximation of A. The analysis developed in [3] shows that when no preconditioner is used, the number of iterations in GMRES does not depend on k provided that the right hand side Xk spans an approximate invariant subspace of A and the stopping criterion in GMRES gradually decreases as in (1). However, when a preconditioner is used and the systems are solved as in (2), a growth in the number of iterations needed by GMRES can be observed. The growth can be stopped using the concept of tuning mentioned in the introduction. It consists in choosing, at iteration k, a preconditioner Pk such that span(Xk ) ‘‘tends to’’ an invariant subspace of APk−1 . A preconditioner which satisfies this condition is given by (see [3, Theorem 4.5]) Pk = P (I − Xk Xk∗ ) + AXk Xk∗ = P + (A − P )Xk Xk∗ .

(3)

P = ΠL−1 LU ΠU−1 ≈ A,

(4)

Let be an incomplete LU factorization of A, where ΠL and ΠU are permutation, and L and U are lower and upper triangular matrices, respectively. Then applying the Woodbury inversion formula [10] to the expression on the right hand side of (3), we obtain the formula Pk−1 = ΠU U −1 (I + Vk Wk )L−1 ΠL

(5)

and Vk =  Vk (I − Wk Vk )−1 where  Vk = U ΠU−1 Xk − L−1 ΠL AXk .

with Wk = Xk ΠU U We will assume further that Algorithm 1 always uses the right preconditioning (2), (5) for GMRES to solve the block linear system in step 3. ∗

−1

3. The Newton method In this section we present a Newton-type method for approximating the invariant subspace of a given matrix A associated with a group of its eigenvalues with smallest magnitude. The method described in the previous section will be used as a preprocessing to obtain a good initial guess.

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



3

Let M be a square matrix of order n and Y0 be an n-by-p matrix whose columns are orthonormal and span an approximate invariant subspace of M corresponding to a separated group of p eigenvalues. In [6] the following process for improving the subspace was considered. For k = 0, 1, . . . , compute Ck = (Yk∗ MYk )−1 ,

∆k = Yk − MYk Ck ,

(6)

solve the Sylvester equation

Ψk − (I − Yk Yk∗ )M Ψk Ck = ∆k

(7)

for Ψk and set Yk+1 = ort(Yk − Ψk ).

(8)

The idea of block Newton methods for eigenvalue problems is not new. See for example [11] where a block version of Newton’s method is proposed for a symmetric matrix and [12] where another block Newton method is proposed for nonlinear eigenproblems and therefore for ordinary linear nonsymmetric eigenproblems as a special case. Nevertheless, the algorithm (6)–(8) offers some advantages. Unlike [11], it can be applied to nonsymmetric eigenproblems and has a simple convergence theory. On the other hand, it cannot be derived from [12]. This is typical of Newton type methods where different nonlinear equations can be written for the same problem giving rise, after linearization, to different schemes which are not algebraically equivalent. Furthermore, the algorithm does not use the Rayleigh quotient iterations and its main step consists in solving the Sylvester equation whose solution is discussed in Section 4. To compute the eigenvalues of A with smallest magnitude, we will apply algorithm (6)–(8) to the inverse matrix A−1 whose spectrum has a larger gap between the desired eigenvalues and the others than that of the initial matrix. However, a direct use of the algorithm entails solving linear systems with A with high accuracy. To avoid this drawback, we will use simultaneously two subspaces span(Xk ) and span(Yk ) where Yk = AXk and Yk∗ Yk = I. Both subspaces will converge to the same invariant subspace but span(Xk ) approximates the desired invariant subspace better than span(Yk ) and for this reason we will use it as a result of the iterations. Furthermore, the use of span(Yk ) avoids additional multiplications by A during the computations. The change of variables M = A−1 , Φk = A−1 Ψk and Xk = A−1 Yk in (6)–(8) leads to the following scheme. Choose an n-by-p matrix X0 with A∗ A-orthonormal columns such that Y0 = AX0 (with Y0∗ Y0 = I) span an approximate invariant subspace of A corresponding to its p eigenvalues with smallest magnitude. Then for k = 1, 2, . . . , compute Ck = (Yk∗ Xk )−1 ,

∆k = Yk − Xk Ck ,

(9)

and solve the Sylvester equation AΦk − (I − Yk Yk∗ )Φk Ck = ∆k

(10)

for Φk , compute

 Xk+1 = Xk − Φk ,

(11)

compute the QR-decomposition A Xk+1 = Yk+1 Sk+1 , with Yk∗+1 Yk+1 = I, and set Xk+1 :=  Xk+1 S −1 . k+1

From (9) and (10), we see that Yk∗ ∆k = 0 and Yk∗ AΦk = 0. Thus, the system (10) can be written as

(I − Yk Yk∗ )(AΦk − Φk Ck ) = (I − Yk Yk∗ )∆k , where Zk = ort(A∗ Yk ).

Zk∗ Φk = 0,

(12)

The following theorem which shows quadratic convergence of the approximate invariant subspace Yk = span Yk to the exact invariant subspace Y of the matrix M was proved for (6)–(8) in [6] for the case when the desired eigenvalues lie outside and the others lie inside the unit disk of the complex plane. Theorem 3.1. If Y0 is sufficiently close to Y, then for all k = 0, 1, . . . we have tan ̸ (Yk+1 , Y) ≤ c tan2 ̸ (Yk , Y),

(13)

where c is a positive constant depending only on the matrix M and the invariant subspace Y. In this theorem tan ̸ (Z, Y) denotes the tangent of the largest canonical angle between the subspaces Z and Y (see, e.g. [10, p. 584]). The proof in [6] uses the notion of integral criteria for spectral dichotomy (see [13, Chapter 9]) which is convenient for discrete analogs of completely continuous operators but makes the proof difficult to follow in the general case. We propose, in Appendix A, a simple proof using the separation between two matrices [14, p. 231] which is more popular. Moreover, unlike the proof in [6], the present one applies to any separated part of the spectrum. The method described in (9), (12), (11) is algebraically equivalent to (6)–(8) with M = A−1 . Therefore, Theorem 3.1 guarantees its fast convergence if we solve the Sylvester equation (12) with sufficient accuracy and start the Newton iterations with a good guess. A proof of some inexact variant of (9), (12), (11) can be found in [15], but does not apply to the present case.

4

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



4. Solving the Sylvester equation System (12) with an arbitrary square p × p matrix Ck can be transformed to a system of the same form as (12) but with an upper triangular matrix Ck . Computing the Schur decomposition Yk∗ Xk = Qk Tk Qk∗ ,

(14) −1



where Qk is unitary and Tk is upper triangular (see, e.g., [10]), substituting Ck = Qk Tk Qk into (12) and multiplying both equations by Qk on the right we obtain

(I − Yk Yk∗ )(AΦk Qk − Φk Qk Tk−1 ) = (I − Yk Yk∗ )∆k Qk ,

Zk∗ Φk Qk = 0.

Now, the change of variables Ck := Tk−1 , Φk := Φk Qk and ∆k := ∆k Qk leads to (12) with matrix Ck of the desired form. Note that we use only unitary transformations which provide the computational stability. (k) Denoting by Φkj and ∆kj the j-th columns of Φk and ∆k , respectively, and by cij the entry (i, j) of the upper triangular matrix Ck , the system (12) can be written as

  (I − Yk Yk∗ ) A − cjj(k) I Φkj = Ωkj ,

Zk∗ Φkj = 0,

j = 1, 2, . . . ,

(15)

where

 Ωk1 = (I − Yk Yk )∆k1 ,

Ωkj = (I − Yk Yk ) ∆kj +





j−1 

(k)

cij Φki

 ,

j > 1.

i =1

Each system in (15) can be solved using the right preconditioning: Bk Γkj = Ωkj ,

Φkj = Lk Γkj ,

where



(k)



Bk = (I − Yk Yk∗ ) A − cjj I Lk

(16)

and Lk is a preconditioning matrix. Applying GMRES to the preconditioned system we find an approximate solution Γˆ kj which satisfies

    Ωkj − Bk Γˆ kj  ≤ δ ∥∆k ∥2 ,

(17)

2

where δ is a given tolerance. To guarantee the second equality in (15) the preconditioning matrix Lk has to satisfy the equality Lk = (I − Zk Zk∗ )Lk . On the other hand, the approximate solution at each step of GMRES will belong to the Krylov subspace

generated by the matrix Bk , which satisfies Bk = (I − Yk Yk∗ )Bk , and the initial residual rkj0 = Ωkj − Bk Γˆ kj0 , which satisfies

rkj0 = (I − Yk Yk∗ )rkj0 , and therefore the result does not change (in exact arithmetic) if we use matrix Lk (I − Yk Yk∗ ) instead of Lk . Thus, the most general form of a suitable matrix Lk is Lk = (I − Zk Zk∗ )L˜ k (I − Yk Yk∗ ),

(18)

where L˜ k is an n × n matrix, and the problem of choosing the preconditioning matrix Lk is reduced to that of choosing L˜ k . As explained in Appendix B, a good choice for this matrix is L˜ k = P −1 where P is, for instance, the incomplete LU factorization (4). Note that if A is real, then the real Schur decomposition can be used in (14) where Qk is orthogonal and Tk is real upper quasi-triangular (see e.g. [10]), and this leads to linear systems with one-column real right-hand sides and Sylvester equations with two-column real right-hand sides. In the latter case, we use 2 × 2 complex Schur decomposition to transform the Sylvester equation into two complex linear systems with one-column complex right-hand sides and preconditioned matrices of the form

(I − Yk Yk∗ ) (A − cI ) (I − Zk Zk∗ )P −1 (I − Yk Yk∗ ) where c is a complex scalar and all matrices are real. Since a multiplication of this matrix by a complex vector is twice as expensive than in the case when c and the vector are real, the total cost of solving the Sylvester equations does not increase more than twice in comparison with the case when all eigenvalues of all matrices Ck are real. After solving these systems, we return to real computations by the inverse transformation. 5. Practical Newton’s method The matrices Xk and Yk , computed with the algorithm described in Section 3, satisfy the equality Yk = AXk . Therefore, the invariant subspace corresponding to the p eigenvalues with smallest magnitude is better approximated by span(Xk ) than by span(Yk ). Thus, once  Xk+1 in (11) is computed, it seems reasonable to compute Xk+1 = ort( Xk+1 ), Yk+1 = AXk+1 and check if the residual Rk+1 = Yk+1 − Xk+1 (Xk∗+1 Yk+1 ) is small enough. Then, if it is not the case, we compute the QR-decomposition Yk+1 = Vk+1 Sk+1 and set Yk+1 := Vk+1 and Xk+1 := Xk+1 Sk−+11 to make the columns of Yk+1 orthonormal, and therefore, those

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



5

Fig. 1. Convergence behavior of Algorithm 2 for m = 200 (left) and 300 (right), δ = 10−4 .

of Xk+1 A∗ A-orthonormal, and continue the Newton iterations. An algorithm, which realizes this modification and solves the Sylvester equation with the method described in Section 4 is described below. Algorithm 2 (Practical Newton’s Method). Given ε > 0, δ > 0 and X0 ∈ Cn×p with X0∗ X0 = I, set Y0 = AX0 . For k = 0, 1, . . . 1. 2. 3. 4. 5. 6. 7. 8.

Compute Rk = Yk − Xk (Xk∗ Yk ). Test the convergence: if ∥Rk ∥2 ≤ ε then exit. Compute the decomposition Yk = Vk Sk , where Vk∗ Vk = I, and set Xk := Xk Sk−1 , Yk := Vk . Compute the Schur decomposition Yk∗ Xk = Qk Tk Qk∗ and set Ck = Tk−1 . Set Xk := Xk Qk , Yk := Yk Qk . Compute ∆k = Yk − Xk Ck and Zk = ort(A∗ Yk ). Solve the system (12) inexactly for Φk with the method described in Section 4. Compute Xk+1 = ort (Xk − Φk ) and Yk+1 = AXk+1 .

End To minimize the number of iterations in Algorithm 2, the Schur decomposition computed in step 4 needs to have the diagonal entries of Tk ordered in non-increasing order of magnitude. Algorithm 2 needs a good initial guess to converge quadratically to the desired invariant subspace. For this reason we propose to carry out a few iterations of Algorithm 1 and set X0 = Xout , see the next section. 6. Numerical experiments In this section we report some numerical experiments illustrating the behavior of Algorithm 2. The test problem is the elliptic operator given by



∂ ∂ u− v + µ∆ ∂x ∂y

in the square (x, y) ∈ (0, 1) × (0, 1), with Dirichlet boundary conditions, where ∆ denotes the 2D Laplace operator, µ = 5 × 10−4 and u=

∂ϕ , ∂y

v=−

∂ϕ , ∂x

ϕ(x, y) =

1 4π

cos(2π x2 ) cos(2π y2 ).

Using a uniform grid with a number of inner nodes in each direction equal to m and the standard second order approximation, the resulting matrix A is of order n = m2 . Throughout the experiments we look for the invariant subspace corresponding to p = 8 eigenvalues with smallest magnitude (so we use the shift σ = 0 and apply the algorithm to real matrix A) and present numerical results with m = 200, 300 and 400. The matrix X0 in Algorithm 2 is obtained with a few iterations of Algorithm 1 started with a random, but fix for each m, subspace of dimension 8. This operation will be referred to as the preprocessing stage. For different values of the given tolerance εSI we will show the number of iterations kSI used by Algorithm 1 to obtain the matrix X0 for which the initial residual R0 satisfies ∥R0 ∥2 ≤ εSI , the number of iterations k used in Algorithm 2 to reach the final residual Rk and its norm ∥Rk ∥2 , the numbers SSI and SNM of solutions of the preconditioned systems and one-column real right-hand sides needed respectively for the preprocessing stage and Algorithm 2, and their total number ST = SSI + SNM . These numbers are good indicators of the costs of Algorithms 1 and 2. The choice of X0 is crucial for the success of Algorithm 2 since the convergence is locally quadratic provided that the columns of X0 already span an approximation of the desired invariant subspace. Fig. 1 demonstrates it for m = 200 (left)

6

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



Table 1 Detailed information on convergence, Algorithm 2, δ = 10−4 . m

εSI

kSI

∥R0 ∥2

SSI

k

∥Rk ∥2

200

10−1 10−2 10−3

13 22 71

7.4 × 10−2 9.4 × 10−3 9.7 × 10−4

621 906 2349

5 5 3

3.2 × 10−12 6.7 × 10−13 5.1 × 10−13

788 653 316

1409 1559 2665

300

10−1 10−2 10−3

6 33 89

9.0 × 10−2 9.6 × 10−3 9.7 × 10−4

529 1526 3394

9 4 3

2.0 × 10−12 1.0 × 10−12 4.3 × 10−12

2098 535 378

2627 2061 3772

SNM

ST

Fig. 2. Exact eigenvalues and eigenvalues computed via Algorithm 2 for m = 200 (left) and for m = 300 (right) with εSI = 10−1 . Table 2 Numbers of nonzero elements in A, L and U. m

nnz (A)

nnz (L)

nnz (U )

200 300 400

199 200 448 800 798 400

660 885 1 488 521 2 467 964

655 287 1 470 368 2 437 090

and 300 (right). It shows the convergence behavior when the initial guess X0 is obtained as described above using different values of εSI . Table 1 shows detailed information on convergence. In the case m = 200 we have a quadratic convergence for all values of εSI and, as Fig. 2 (left) shows, the eigenvalues computed via Algorithm 2 and by exact inverse subspace iteration (i.e. Algorithm 1 where the systems in step 3 are solved exactly) are the same, and therefore Algorithm 2 converges towards the desired invariant subspace. In the case m = 300 we obtain a perturbation of the quadratic convergence when εSI = 10−1 but a typical Newton quadratic convergence when εSI = 10−2 or 10−3 . Moreover, when εSI = 10−1 Algorithm 2 has computed 2 eigenvalues which are not among the 8 smallest in magnitude, see Fig. 2 (right). In the two other cases, the invariant subspace associated with the 8 eigenvalues of smallest magnitude has been computed successfully. As we have already mentioned, for each value of m, Algorithm 1 was started with a random subspace. By varying this subspace, we have obtained either the quadratic convergence of Newton’s method or a perturbation of the quadratic convergence when εSI = 10−1 . However, the quadratic convergence took place with higher probability when m = 200 than when m = 300. With εSI ≤ 10−2 , we always had quadratic convergence for both values of m. The systems in step 3 of Algorithm 1 were preconditioned as explained in (2)–(5), while the systems (15) in step 7 of Algorithm 2 were preconditioned as explained in (16)–(18) with L˜ k = P −1 . The matrix P in (3) and (18) was computed via an incomplete LU factorization of the form (4) with a drop tolerance fixed at 10−3 . The numbers of nonzero elements in A, and the resulting factors L and U are shown in Table 2. All systems were solved approximately by GMRES using a default value of 50 for the dimension of the Krylov basis. In (1), δ1 and δ2 were chosen as 10−4 and 10−2 , respectively, while in (17), δ was fixed at 10−4 . The convergence was achieved when either the stopping criteria are satisfied or when GMRES had reached the default maximum number of restarts. The number of GMRES iterations required for each system in the experiments was not larger than 23 for m = 200, and 49 for m = 300. In other words, the restarts were never used. Figs. 3, 4 (left) and Table 3 show that the convergence of Newton’s method transforms from quadratic to linear as the value of δ increases. But as Fig. 4 (right) shows even if δ = 10−1 it remains significantly faster than that of Algorithm 1. This algorithm necessitates 466 iterations and 15 833 solutions of the preconditioned systems.

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



7

Fig. 3. Convergence behavior of Algorithm 2 for δ = 10−3 (left) and δ = 10−2 (right), m = 300.

Fig. 4. Convergence behaviors of Algorithm 2 for δ = 10−1 (left) and Algorithm 1 (right), m = 300. Table 3 Detailed information on convergence, m = 300.

δ = 10−1 εSI

δ = 10−2

−1

k SNM ST

−2

10 16 2550 3079

−3

10 8 803 2329

10 7 649 4043

−1

δ = 10−3 −2

10 11 2148 2677

−3

10 5 544 2070

10 4 412 3806

10−1 9 1930 2459

10−2 5 715 2241

10−3 3 332 3726

Table 4 Maximum number of GMRES iterations (lmax ) for m = 300.

εSI δ lmax

10−1 10 37

−1

10−2 −2

10 42

−3

10 46

−4

10 49

−1

10 19

10−3 −2

10 24

−3

10 27

−4

10 30

10−1 19

10−2 24

10−3 28

10−4 31

As was mentioned above for εSI = 10−1 when δ = 10−4 , the invariant subspace associated with the 8 eigenvalues of smallest magnitude was not successfully computed. For smaller values of εSI the invariant subspace was computed successfully. This remark applies also to the cases δ = 10−3 , 10−2 and 10−1 . In the examples above, the number of GMRES iterations required for each system in (15) increases as δ decreases and weakly depends on εSI when εSI is small enough, see Table 4. This table shows that the number of GMRES iterations with a fixed drop tolerance δ weakly depends on the iteration of Newton’s method when the computed invariant subspace is sufficiently close to the exact invariant subspace. This will be explained in Appendix B. The next example is devoted to compare Algorithms 1 and 2 for m = 400. Algorithm 1 needs 481 iterations and 18 745 solutions of the preconditioned systems to reach a residual norm of order 10−11 . While, as it follows from Table 5 (the second line), the use of Newton’s method with εSI = 10−2 decreases the total number of the preconditioned systems in about 7 times to reach a residual norm of the same order. The eigenvalues computed with both methods coincided with the 8 exact eigenvalues of smallest magnitude. Since the computational cost of the preprocessing stage increases when εSI decreases,

8

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



Table 5 Detailed information on convergence, m = 400, δ = 10−4 . m

εSI

kSI

∥R0 ∥2

SSI

k

∥Rk ∥2

SNM

ST

400

10−2 10−3

38 96

9.9 × 10−3 9.8 × 10−4

2026 4204

4 3

1.0 × 10−11 1.0 × 10−12

697 502

2723 4706

the total computational cost increases when Newton’s method is started with εSI = 10−3 (the last line of Table 5) instead of 10−2 but it is still significantly smaller than that of Algorithm 1. For this example also, the default value of 50 for the dimension of the Krylov basis was used and this value has not been reached. Note that in the considered case, the initial matrix A and the shift are real but A has complex eigenvalues. Thus, the inverse subspace iteration needs only real computations while Newton’s method leads to complex computations (at the stage of solving the Sylvester equation, see the explanation at the end of Section 4) which were twice as expensive. When it is not the case, the use of Newton’s method gives an additional acceleration (up to two times). 7. Conclusion We have proposed a robust version of Newton’s method for computing the invariant subspace associated with a separated group of eigenvalues near a given shift of a large sparse matrix. The method uses the approximation obtained from a few inexact inverse subspace iterations with a tuned preconditioner for inner iterations as an initial guess (preprocessing stage). For solving the Sylvester equation (12) which arises in each step of the Newton method, a new algorithm based on preconditioned GMRES was proposed. The algorithm helps maintain the quadratic convergence of the Newton method if the tolerance for GMRES iterations is fixed small enough. From the numerical experiments we can draw the following remarks: – When the preprocessing stage provides a relatively accurate initial guess for the Newton method, the convergence is quadratic towards the wanted invariant subspace. – When this initial guess is not accurate enough, the convergence is first linear and then changes to quadratic, sometimes towards an unwanted invariant subspace. – The number of GMRES iterations with a fixed drop tolerance weakly depends on the Newton iteration when the computed invariant subspace is sufficiently close to the exact invariant subspace. Acknowledgments The authors thank the referees for useful remarks and suggestions for improving the paper. This work was supported by the Russian Foundation for Basic Research (Project No. 10-01-00513) and the Russian Academy of Sciences (project ‘‘Optimization of numerical algorithms for solving the problems of mathematical physics’’). Appendix A. Proof of Theorem 3.1 For the proof we need the following lemma which can be found in [6]. Lemma A.1. Let

  S1 = span

 

I , 0

S2 = span

I , X

  S3 = span

I , Z

where X and Z are matrices such that ∥X ∥2 ∥X − Z ∥2 < 1. Then tan ̸ (S1 , S2 ) = ∥X ∥2 ,

tan ̸ (S2 , S3 ) ≤

∥X − Z ∥2 1 − ∥ X ∥2 ∥X − Z ∥2

.

(A.1)

Proof of Theorem 3.1. Recall that if U is an arbitrary unitary matrix of order n, whose first p columns form an orthonormal basis of an invariant subspace Y of dimension p corresponding to a separated group of eigenvalues Λ, then

 M =U

M11 0



M12 U∗ M22

with submatrix M11 of size p × p whose spectrum is Λ. The 2-norm and the Frobenius norm ∥.∥F of each submatrix Mij do not depend on the choice of U. Moreover, the separation sep(M11 , M22 ) = min X ̸=0

∥XM11 − M22 X ∥F ∥X ∥F

(see e.g. [14] for details) between matrices M11 and M22 is nonzero and does not depend on the choice of U as well.

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



9

˜ be an approximate invariant subspace close to Y and U˜ be an arbitrary unitary matrix of order n, whose first p Let Y ˜ . Then columns form an orthonormal basis of Y ˜ 11 M ˜ 21 M

˜ 12 ∗ M ˜ ˜ 22 U M



M = U˜



˜ 11 of size p × p whose spectrum is close to Λ. Due to the separation of Λ from the rest of the spectrum, with submatrix M we have (see [14, Chapter V]): ˜ 21 ∥F → 0, ∥M

˜ 12 ∥F → ∥M12 ∥F , ∥M

˜ 11 , M ˜ 22 ) → sep(M11 , M22 ) sep(M

˜ → Y. Thus, there exist positive values ε and b such that as Y ˜ 12 ∥F /sep(M ˜ 11 , M ˜ 22 ) ≤ b ∥M

(A.2)

˜ which satisfies the inequality for any Y ˜ , Y) ≤ ε. tan ̸ (Y

(A.3)

Now denote by Uk an arbitrary unitary matrix of order n, whose first p columns form the matrix Yk (see (6)–(8)), and introduce the following notation:

 



Xk = Uk Yk =

I , 0

Xk = spanXk ,

Y˜k+1 = Yk − Ψk ,

˜ k+1 = spanX˜ k+1 , X

X˜ k+1 = Uk∗ Y˜k+1 ,

X = Uk∗ Y,

and Mk = Uk∗ MUk =



(k)

M11 (k) M21

(k) 

M12 (k) , M22

(k)

with submatrix M11 of size p × p. Note that Yk+1 = spanY˜k+1 and ̸

(Yk , Y) = ̸ (Xk , X),

̸

˜ k+1 , X). (Yk+1 , Y) = ̸ (X

(A.4)

The matrix Ψk satisfies (7) if and only if the matrix Y˜k+1 satisfies the equality Y˜k+1 − (I − Yk Yk∗ )M Y˜k+1 Ck = Yk , which is equivalent to X˜ k+1 − (I − Xk Xk∗ )Mk X˜ k+1 Ck = Xk .

(A.5)

Taking into account that (k)

Ck−1 = Yk∗ MYk = Xk∗ Mk Xk = M11 , it is easy to derive from (A.5) that X˜ k+1 =

  I Z

where Z solves the Sylvester equation (k)

(k)

(k)

ZM11 − M22 Z = M21 .

(A.6)

Since the matrices M and Mk are similar and Y is an invariant subspace of matrix M, we deduce that X is the invariant subspace of matrix Mk corresponding to the same part of the spectrum. If the subspace Yk is sufficiently close to Y, and therefore the subspace Xk is sufficiently close to X, then the subspace X can be represented as

  X = span

I X

where X is an (n − p) × p matrix such that

  Mk

I X

  =

I B X

(A.7)

10

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



with some square matrix B of order p. Using the block structure of Mk and excluding B from (A.7) one can derive the following equation for X : (k)

(k)

(k)

(k)

XM11 − M22 X + XM12 X = M21 .

(A.8)

Now, subtracting (A.6) from (A.8) we obtain the equation (k) (k) (k) (X − Z )M11 − M22 (X − Z ) = −XM12 X

which implies

∥X − Z ∥2 ≤ ck ∥X ∥22

(A.9)

where (k)

(k)

(k)

ck = ∥M12 ∥F /sep(M11 , M22 ).

˜ k+1 , we obtain Applying Lemma A.1 with S1 = Xk , S2 = X and, S3 = X tan ̸ (Xk , X) = ∥X ∥2 ,

˜ k+1 , X) ≤ tan ̸ (X

∥X − Z ∥2 1 − ∥ X ∥2 ∥X − Z ∥2

.

(A.10)

Due to (A.4), (A.9) and the first inequality in (A.10), the second inequality in (A.10) implies the following estimate tan ̸ (Yk+1 , Y) ≤

ck tan2 ̸ (Yk , Y)

(A.11)

1 − ck tan3 ̸ (Yk , Y)

which leads to (13) by different ways. For instance, if tan ̸ (Y0 , Y) ≤ min{1, ε, 1/(2b)},

(A.12)

where ε and b are constants in (A.2), (A.3), then for all k ≥ 0 the estimate (A.11) leads to (13) with c = 2b. Indeed, since (A.3) leads (A.2), then (A.12) implies c0 ≤ b, and therefore, c0 tan3 ̸ (Y0 , Y) ≤ 1/2. Using this inequality and (A.11) we obtain (13) with c = 2b for k = 0. But (A.12) and (13) with c = 2b and k = 0 implies tan ̸ (Y1 , Y) ≤ min{1, ε, 1/(2b)}. Applying now the induction we conclude that (13) with c = 2b holds for all k.



Appendix B. Choice of the preconditioning matrix Let A be a square nonsingular matrix and [Y , Y˜ ] be a unitary matrix of order n, where Y and Y˜ are submatrices of sizes n × p and n × (n − p), respectively, with 1 ≤ p < n. Consider the system Bx = b,

(B.1)

with a given n-component column b such that b = (I − YY )b and the matrix ∗

B = (I − YY ∗ )(A − cI )(I − ZZ ∗ )P −1 (I − YY ∗ ), where Z = ort(A∗ Y ), P is a nonsingular matrix of order n and c is an eigenvalue of the matrix C = (Y ∗ A−1 Y )−1 . In this section we briefly justify the following statement: if A is a discrete analog of an elliptic operator, P −1 ≈ A−1 and span(Y ) is an approximate invariant subspace of A corresponding to eigenvalues of smallest magnitude, then the number of GMRES iterations needed to solve (B.1) with a reasonable accuracy will be small. Note first that in exact arithmetic, applying l GMRES iterations with any initial guess x0 to the system (B.1) we obtain an approximate solution xˆ = Y˜ xˆ˜ where xˆ˜ is the result of l GMRES iterations applied to the reduced system B˜ x˜ = b˜

(B.2)

˜∗ 0

with the initial guess Y x , where b˜ = Y˜ ∗ b,

B˜ = Y˜ ∗ BY˜ = Y˜ ∗ (A − cI )(I − ZZ ∗ )P −1 Y˜ .

Thus, it is sufficient to consider system (B.2) instead of (B.1). Taking into account that ZZ ∗ = A∗ Y (Y ∗ AA∗ Y )−1 Y ∗ A,

G. El Khoury et al. / Journal of Computational and Applied Mathematics (

)



11

and therefore ZZ ∗ A−1 (I − YY ∗ ) = 0, one can draw the following formula: B = (I − YY ∗ ) I − cA−1 − (A − cI )(I − ZZ ∗ )(A−1 − P −1 ) (I − YY ∗ ).





Thus, B˜ = I − c Y˜ ∗ A−1 Y˜ − Y˜ ∗ (A − cI )(I − ZZ ∗ )(A−1 − P −1 )Y˜ .

(B.3)

Denote by λ1 , . . . , λn the eigenvalues of A ordered in non-decreasing order of magnitude. Let |λp | < |λp+1 |, P = A and span(Y ) be the invariant subspace of matrix A corresponding to the first p eigenvalues. Then, the spectrum of the matrix B˜ is

{1 − µi : i = 1, . . . , n − p}, where µi = c /λi+p with c ∈ {λ1 , . . . , λp }, and therefore |1 − µi | > 0. If the squared Frobenius norm

γ = ∥I − B˜ ∥2F ≤ c 2 ∥A−1 ∥2F is not large, then the convergence of GMRES iterations for the system (B.2) will be superlinear [16, Theorem 1]. Note that γ is small, for instance, if A is a discrete analog of an elliptic operator (as in the example considered in Section 6). In this case, GMRES will compute an approximate solution with a reasonable accuracy in a few iterations. When P −1 ≈ A−1 and span(Y ) is sufficiently close to the invariant subspace of A corresponding to eigenvalues λ1 , . . . , λp , we may expect that the number of GMRES iterations will be small as well. Moreover, a stopping criterion of the form

∥B˜ x˜ − b˜ ∥2 ≤ δ∥b˜ ∥2 = δ∥b∥2 with a fixed δ makes the number of iterations weakly dependent on Y . References [1] M.A. Freitag, A. Spence, Convergence of inexact inverse iteration with application to preconditioned iterative solves, BIT 47 (2007) 27–44. [2] M.A. Freitag, A. Spence, A tuned preconditioner for inexact inverse iteration applied to Hermitian eigenvalue problems, IMA J. Numer. Anal. 28 (2008) 522–551. [3] M. Robbe, M. Sadkane, A. Spence, Inexact inverse subspace iteration with preconditioning applied to non-Hermitian eigenvalue problems, SIAM J. Matrix Anal. Appl. 31 (2009) 92–113. [4] F. Xue, H.C. Elman, Convergence analysis of iterative solvers in inexact Rayleigh quotient iteration, SIAM J. Matrix Anal. Appl. 31 (2009) 877–899. [5] F. Xue, H.C. Elman, Fast inexact subspace iteration for generalized eigenvalue problems with spectral transformation, Linear Algebra Appl. 435 (2011) 601–622. [6] S.K. Godunov, Y.M. Nechepurenko, Bounds for the convergence rate of Newton’s method for calculating invariant subspaces, Comput. Math. Math. Phys. 42 (2002) 739–746. [7] K. Meerbergen, D. Roose, Matrix tranformations for computing rightmost eigenvalues of large sparse non-symmetric eigenvalue problems, IMA J. Numer. Anal. 16 (1996) 297–346. [8] B. Philippe, M. Sadkane, Improving the spectral transformation block Arnoldi method, in: S.D. Margenov, P.S. Vassilevski (Eds.), IMACS Symposium on Iterative Methods in Linear Algebra, in: IMACS Series in Computatinal and Applied Mathematics, vol. 3, 1996, pp. 57–63. [9] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing, Boston, 1996. [10] G.H. Golub, Ch. Van Loan, Matrix Computations, second ed., The Johns Hopkins University Press, 1989. [11] R. Losche, H. Schwetlick, G. Timmermann, A modified block Newton iteration for approximating an invariant subspace of a symmetric matrix, Linear Algebra Appl. 275–276 (1998) 381–400. [12] D. Kressner, A block Newton method for nonlinear eigenvalue problems, Numer. Math. 114 (2009) 355–372. [13] S.K. Godunov, Modern Aspects of Linear Algebra, in: Translations of Mathematical Monographs, vol. 175, Amer. Math. Soc., Providence, RI, 1998. [14] G.W. Stewart, J.-G. Sun, Matrix Perturbation Theory, Academic Press, San Diego, California, 1990. [15] Yu.M. Nechepurenko, M. Sadkane, Convergence of the Newton–Kantorovich method for calculating invariant subspaces, Math. Notes 75 (2004) 101–106. [16] I. Kaporin, Superlinear convergence in minimum residual iterations, Numer. Linear Algebra Appl. 12 (2005) 453–470.