Journal of Computational and Applied Mathematics 367 (2020) 112485
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
A quadratically convergent algorithm for inverse generalized eigenvalue problems Kensuke Aishima Faculty of Computer and Information Sciences, Hosei University, Tokyo 184-8585, Japan
article
info
a b s t r a c t
Article history: Received 20 March 2019 Received in revised form 28 July 2019 Keywords: Inverse eigenvalue problems Generalized symmetric eigenvalue problems Newton’s method Quadratic convergence Multiple eigenvalues
In 2018, for inverse generalized symmetric eigenvalue problems, a new quadratically convergent algorithm was discovered from simple matrix equations. Although this algorithm has some nice features compared with the standard Newton’s method, it cannot be applied to multiple eigenvalues. In this paper, we propose a modified algorithm adapted to an arbitrary set of prescribed eigenvalues. Moreover, we prove its quadratic convergence in a neighborhood of the solutions that satisfy a mild condition. © 2019 Elsevier B.V. All rights reserved.
1. Introduction Let A0 , A1 , . . . , An , B0 , B1 , . . . , Bn ∈ Rn×n be symmetric matrices. In addition, let c = [c1 , c2 , . . . , cn ]T ∈ Rn ,
A(c) = A0 + c1 A1 + · · · + cn An ,
B(c) = B0 + c1 B1 + · · · + cn Bn .
The purpose is to obtain c such that A(c )xi = λi B(c )xi (1 ≤ i ≤ n) for the prescribed eigenvalues λ∗1 ≤ · · · ≤ λ∗n . We assume that c ∗ is locally unique and that B(c ∗ ) is positive definite. Such inverse eigenvalue problems arise in inverse Sturm–Liouville problems, inverse vibration problems, nuclear spectroscopy, and structural design [1–5]. We consider numerical algorithms for solving the above problems. There are various Newton-type methods [4,6,7]. Recently, a nice quadratically convergent algorithm has been presented in [8, §5]. It is based on an idea of an iterative refinement algorithm for symmetric eigenvalue problems recently proposed in [9]. The algorithm in [8, §5] is described as follows. First of all, we focus on the next matrix equations: ∗
{
X T B(c)X = I , X T A(c)X = Λ∗
∗
∗
∗
∗
∗
Λ∗ = diag(λ∗1 , . . . , λ∗n ).
In addition, an approximate matrix X (0) of X ∗ := [x∗1 , . . . , x∗n ] is given, where X ∗ is normalized as X ∗ T B(c ∗ )X ∗ = I. For computed X (k) (k = 0, 1, . . .) in the iterative process, define E (k) (k = 0, 1, . . .) such that X (k) = X ∗ (I + E (k) ). Then, we have the following relations:
{
(k)
I + E (k) + E (k)T + ∆1 = X (k)T B(c ∗ )X (k) , (k) Λ∗ + Λ∗ E (k) + E (k)T Λ∗ + ∆2 = X (k)T A(c ∗ )X (k) (k)
∆1 := E (k)T E (k) ,
(k)
∆2 := E (k)T Λ∗ E (k) .
E-mail address:
[email protected]. https://doi.org/10.1016/j.cam.2019.112485 0377-0427/© 2019 Elsevier B.V. All rights reserved.
(1)
2
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
Omitting the quadratic terms, we obtain the key matrix equations:
{
I +˜ E (k) + ˜ E (k)T = X (k)T B(c (k+1) )X (k) , ∗ ∗˜(k) Λ + Λ E +˜ E (k)T Λ∗ = X (k)T A(c (k+1) )X (k)
(2)
where the unknown variables of ˜ E (k) and c (k+1) can be easily computed if λ∗1 < · · · < λ∗n as in [8, §5.2]. However, if Λ∗ has multiple eigenvalues, the matrix equations in (2) have no solutions in general, and hence some modifications are required. From this viewpoint, we propose a modified algorithm adapted to multiple eigenvalues, where the quadratic convergence is naturally proved in a neighborhood of the solutions that satisfy a mild condition. 2. Proposed algorithm For simplicity, we assume that the first p eigenvalues are multiple, i.e.,
λ∗1 = · · · = λ∗p < λ∗p+1 < · · · < λ∗n in the same manner as [4–6]. It is easy to generalize the following discussion to an arbitrary set of prescribed eigenvalues, where we exclude the case of λ∗1 = · · · = λ∗n . In this case, our target problems are reduced to simple linear systems: A(c ∗ ) = λ∗ B(c ∗ ). The proposed algorithm and its convergence theory do not cover this situation. The idea is to employ the mathematical discussions in [10] for inverse standard symmetric eigenvalue problems. However, such a naive algorithm design is not sufficient for the rigorous convergence analysis. To prove its quadratic convergence, we will derive an essentially new matrix equations (13), as shown later. The new algorithm is designed as follows. First of all, note that, even if the problem has a locally unique solution c ∗ , the corresponding X ∗ is not unique. In other words, Xp∗ := [x∗1 , . . . , x∗p ] associated with the multiple eigenvalues is not unique. More specifically, for any orthogonal matrix Q ∈ Rp×p , all the columns of Xp∗ Q are also eigenvectors. Hence, for a given X (k) , let
√
Y (k) := arg min ∥ B(c ∗ )(X ∗ − X (k) )∥F ,
(3)
X∗
where ∥ · ∥F denotes the Frobenius norm. In addition, define F (k) that satisfies X (k) = Y (k) (I + F (k) ).
(4)
Then, we see
√ √ √ ∥F (k) ∥F = ∥ B(c ∗ )Y (k) F (k) ∥F = ∥ B(c ∗ )(X (k) − Y (k) )∥F = min ∥ B(c ∗ )(X ∗ − X (k) )∥F ∗
(5)
X
from easy calculations. For a given X (k) , the above F (k) is locally unique, and the leading principal p × p submatrix of F (k) is symmetric. In the following, we prove such a feature with the aid of the orthogonal Procrustes problem [11, §6.4.1]. In other words, for (k) fixed Xp∗ and Xp , we consider an optimization problem min
Q (k)T Q (k) =I
√ √ ∥ B(c ∗ )Xp∗ Q (k) − B(c ∗ )Xp(k) ∥F ,
(6)
relevant to (3). We can obtain the optimal Q (k) using the polar decomposition Q (k) T (k) := Xp∗ T B(c ∗ )X (k) p ,
(7)
(k)
(k)
where Q is an orthogonal matrix, and T is a positive semi-definite matrix, as shown in [11, §6.4.1]. Note that T (k) in (7) is independent of the choice of Xp∗ because of the uniqueness of the polar decomposition, while the solution Q (k) depends on Xp∗ . Therefore, for the solution Y (k) = [y 1 , . . . , y n ] in (3), we see Yp(k) := [y 1 , . . . , y p ],
√
Yp(k) T B(c ∗ )Xp(k) = ( B(c ∗ )Xp∗ Q (k) )T
√
B(c ∗ )Xp(k) = T (k) ,
which implies that the leading principal p × p submatrix of F (k) in (4) is equal to the symmetric matrix T (k) − Ip (Ip ∈ Rp×p ). Note that the above discussion is easily generalized to an arbitrary set of prescribed eigenvalues. In summary, we have the next lemma for any eigenvalue distribution λ∗1 ≤ · · · ≤ λ∗n in the same manner as [10, Lemma 1]. Lemma 1. For any fixed X (k) , suppose that Y (k) and F (k) are defined as (3) and (4), respectively. Then, F (k) satisfies (5) and [F (k) ]ij = [F (k) ]ji for i, j corresponding to multiple eigenvalues λ∗i = λ∗j . Our aim is to obtain ˜ F (k) approximating F (k) , where the key equations of ˜ F (k) are derived as follows. Similarly to (1), it is easy to see that (k)
I + F (k) + F (k)T + ∆1 = X (k)T B(c ∗ )X (k) ,
Λ +Λ F ∗
∗ (k)
+F
(k)T
Λ + ∗
(k) ∆2
=X
(k)T
(k)
∆1 := F (k)T F (k) , ∗
A(c )X
(k)
,
(k) ∆2
:= F
(8) (k)T
Λ F . ∗ (k)
(9)
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
3
Noting the diagonal elements for (8) and (9), we have the following linearized equations:
{[
[ ] = X (k)T B(c (k+1) )X (k) ii [ ∗ ] [ ] Λ + Λ∗˜ F (k) + ˜ F (k)T Λ∗ ii = X (k)T A(c (k+1) )X (k) ii I +˜ F (k) + ˜ F (k)T
]
ii
(10)
for i = 1, . . . , n. The linearization for the off-diagonal elements is slightly different from the basic version for the situation where the eigenvalues are all distinct. For i ̸ = j corresponding to λ∗i ̸ = λ∗j , let
⎧[ ] [ ] ⎨ I +˜ F (k) + ˜ F (k)T ij = X (k)T B(c (k+1) )X (k) ij [ ] [ ] ⎩ Λ∗ + Λ∗˜ F (k) + ˜ F (k)T Λ∗ ij = X (k)T A(c (k+1) )X (k) ij
(11)
in the same manner as the basic algorithm. However, for i ̸ = j corresponding to λ∗i = λ∗j , letting m denote any integer such that λ∗m ̸ = λ∗i = λ∗j , we consider X (k)T A(c (k+1) )X (k) − λ∗m X (k)T B(c (k+1) )X (k)
(λ∗m ̸ = λ∗i = λ∗j ).
(12)
This viewpoint is essentially different from [10, §2]. For simplicity, we use the eigenvalue λm as in (12), while λm can be replaced by any real number in a particular range. The details are explained in Remark 1 after the main theorem (Theorem 1). The above calculation in (12) corresponds to ∗
∗
Λ∗ + Λ∗˜ F (k) + ˜ F (k)T Λ∗ − λ∗m (I + ˜ F (k) + ˜ F (k)T ) = (Λ∗ − λ∗m I) + (Λ∗ − λ∗m I)˜ F (k) + ˜ F (k)T (Λ∗ − λ∗m I). Thus, on the basis of the symmetry of F (k) in Lemma 1, let
⎧[ ] [ ] ⎨ (Λ∗ − λ∗ I)˜ (k) +˜ F (k)T (Λ∗ − λ∗m I) ij = X (k)T A(c (k+1) )X (k) − λ∗m X (k)T B(c (k+1) )X (k) ij m F [ (k) ] [ (k) ] ⎩˜ F = ˜ F ij
(13)
ji
because the diagonal matrix Λ∗ − λ∗m I is not required. Similarly to the basic algorithm in [8, §5.2], the above equations can be easily solved as follows. First, we compute c (k+1) . Let (k) T [J (k) ]ij = x(k) (Aj − λ∗i Bj )xi (1 ≤ i, j ≤ n), i (k)
(k) xi T (A0
[d ]i = −
−λ
(k) i B0 )xi
∗
(14)
(1 ≤ i ≤ n).
(15)
Then, from (10), we obtain c (k+1) = (J (k) )−1 d (k) .
(16)
In addition, noting the first equation in (10), we see (k) T
[˜ F (k) ]ii =
xi
(k)
B(c (k+1) )xi − 1 2
(1 ≤ i ≤ n).
(17)
Next, we compute the off-diagonal parts of ˜ F (k) corresponding to the distinct eigenvalues λ∗i ̸ = λ∗j . Using (16), in each (i, j) element of (11) we see the following 2 × 2 linear system
{ (k) (k) [˜ F (k) ]ij + [˜ F (k) ]ji = xi T B(c (k+1) )xj (k) T ∗ ˜(k) ∗ ˜(k) (k+1) (k) λi [F ]ij + λj [F ]ji = xi A(c )xj
(1 ≤ i, j ≤ n, λ∗i ̸ = λ∗j ).
Therefore, we obtain (k) T
[˜ F (k) ]ij =
xi
(k)
(k) T
A(c (k+1) )xj − λ∗j xi
(k)
B(c (k+1) )xj
λ∗i − λ∗j
(1 ≤ i, j ≤ n, λ∗i ̸ = λ∗j ).
(18)
Finally, for i ̸ = j corresponding to multiple eigenvalues λ∗i = λ∗j , letting λ∗m ̸ = λ∗i = λ∗j , we have (k) T
[˜ F (k) ]ij =
xi
(k)
(k) T
A(c (k+1) )xj − λ∗m xi
(k)
B(c (k+1) )xj
2(λ∗i − λ∗m )
(1 ≤ i, j ≤ n, i ̸ = j, λ∗i = λ∗j )
(19)
from (13). As a result, we can compute the next step X (k+1) = X (k) (I − ˜ F (k) ),
(20)
where I − ˜ F (k) is the first order approximation of (I + ˜ F (k) )−1 using the Neumann series. In Algorithm 1, we present the proposed algorithm, where lines 9 and 10 are essentially different from [10, Algorithm 1].
4
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485 Table 1 Convergence behavior of ∥F (k) ∥ and ∥c (k) − c ∗ ∥ for n = 8 and Rank(A) = 5. k=0
k=1
k=2
k=3
Propose algorithm (Algorithm 1)
∥F (k) ∥ ∥c (k) − c ∗ ∥
5.12E−02 3.40E−03
2.52E−03 2.57E−03
6.46E−06 6.46E−06
6.18E−11 1.13E−10
Newton method (Algorithm 2.1 in [4])
∥F (k) ∥ ∥c (k) − c ∗ ∥
5.12E−02 3.40E−03
2.77E−03 2.57E−03
1.01E−05 8.64E−06
6.40E−11 5.22E−11
Table 2 Convergence behavior of ∥F (k) ∥ and ∥c (k) − c ∗ ∥ for n = 100 and Rank(A) = 70. k=0
k=1
k=2
k=3
Propose algorithm (Algorithm 1)
∥F (k) ∥ ∥c (k) − c ∗ ∥
3.17E−02 3.02E−02
1.33E−03 6.29E−04
1.51E−05 5.62E−06
1.21E−09 5.40E−10
Newton method (Algorithm 2.1 in [4])
∥F (k) ∥ ∥c (k) − c ∗ ∥
3.17E−02 3.02E−02
1.60E−03 6.29E−04
1.42E−05 7.11E−06
1.85E−09 6.44E−10
Algorithm 1 The Proposed algorithm Require: λ∗1 ≤ · · · ≤ λ∗n , A0 , . . . , An , B0 , . . . , Bn ∈ Rn×n , X (0) ∈ Rn×n . 1: for k := 0, 1, . . . do (k) T (Aj − λ∗i Bj )xi ]ij 2: [J (k) ]ij = [x(k) i
10:
(k) T (A0 − λ∗i B0 )xi ]i [d (k) ]i = −[x(k) i −1 c (k+1) = J (k) d (k) (k) (k)T R = X B(c (k+1) )X (k) [˜ F (k) ]ii = ([R(k) ]ii − 1)/2 (1 ≤ i ≤ n) S (k) = X (k)T A(c (k+1) )X (k) if λ∗i = λ∗j then Choose m such that λ∗m ̸ = λ∗i = λ∗j (k) ˜ Fij = ([S (k) ]ij − λ∗m [R(k) ]ij )/(2(λ∗i − λ∗m )) (1 ≤ i, j ≤ n, i ̸ = j)
11:
else
3: 4: 5: 6: 7: 8: 9:
12: 13: 14: 15:
[˜ F (k) ]ij = ([S (k) ]ij − λ∗j [R(k) ]ij )/(λ∗i − λ∗j ) (1 ≤ i, j ≤ n, i ̸ = j)
end if X (k+1) = X (k) (I − ˜ F (k) ) end for
We present simple numerical results to illustrate the convergence behavior, performed in MATLAB. In addition, we compare it with an existing efficient method, i.e., a standard Newton method (Algorithm 2.1 in [4]) solving a generalized eigenvalue problem for (A(c (k) ), B(c (k) )) per iteration. We used MATLAB’s built-in function eig for solving the generalized eigenvalue problems. The first example is based on [5, Example 2]. Using a rectangular matrix V ∈ R8×5 , we define ˆ A := VV T . In addition, ∑k−1 T T T ˆ ˆ let A0 = O, and Ak = [ A ] (e e + e e ) + [ A ] e e (k = 1 , . . . , 8), where e are the jth unit vectors for j = 1, . . . , 8. kj k j j k kk k k j j=1 Moreover, let B0 = I, and Bk = (1 + k/10)ek eTk (k = 1, . . . , 8). Suppose that c ∗ = [1, 1, . . . , 1]T is a solution of the inverse eigenvalue problem for the above matrix set. Then, there are three multiple eigenvalues: λ∗1 = λ∗2 = λ∗3 = 0, and B(c ∗ ) is a positive definite matrix. The initial guess X (0) is an eigenvector matrix of (A(c (0) ), B(c (0) )) corresponding to c (0) := c ∗ + ϵ, where ϵ is randomly chosen. For the random vector, we used MATLAB’s built-in function randn with rng(‘default’). The convergence behavior is shown in Table 1. The proposed algorithm is designed to reduce ∥F (k) ∥ quadratically, as proved in the next section. However, the Newton method (Algorithm 2.1 in [4]) aims to quadratic convergence of ∥c (k) − c ∗ ∥. Hence, Table 1 displays the values of ∥F (k) ∥ and ∥c (k) − c ∗ ∥. The second example is a large scale problem. Using a rectangular Gaussian matrix V ∈ R100×70∑ , we define ˆ A := VV T ∈ k−1 ˆ T T 100×100 R . The problem setting is similar to the first example. In other words, let A0 = O, and Ak = j=1 [A]kj (ek ej + ej ek ) + T T ∗ T ˆ [A]kk ek ek (k = 1, . . . , 100), In addition, let B0 = I, and Bk = (1 + k/10)ek ek (k = 1, . . . , 100). Let c = [1, 1, . . . , 1] be a solution of the inverse eigenvalue problem for the above matrix set. Then, there are 30 multiple eigenvalues, and B(c ∗ ) is a positive definite matrix. The initial guess X (0) is an eigenvector matrix of (A(c (0) ), B(c (0) )) corresponding to c (0) := c ∗ + ϵ, where ϵ is randomly chosen. Table 2 displays the values of ∥F (k) ∥ and ∥c (k) − c ∗ ∥. From the results in the above two examples, we see that the proposed algorithm is quadratically convergent in some neighborhood of the exact solution. In addition, the proposed algorithm has very similar local behavior to the standard
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
5
Newton method, even though it is not required to solve the generalized eigenvalue problem per iteration in the proposed algorithm. These results are consistent with [5, §4]. One may notice that it is possible to improve the Newton method (Algorithm 2.1 in [4]) because the computation of c (k+1) as in line 4 can be replaced with inexact linear system solvers. Moreover, as [4, Algorithm 5.1], another linear system of c (k+1) can be derived from a slightly different problem setting. In particular problems, the convergence behavior of [4, Algorithm 5.1] is better than the standard Newton method (Algorithm 2.1 in [4]) as presented in [4, § 7]. Such an improvement might be efficient for the proposed algorithm in this paper. However, the main contribution of this paper is to prove quadratic convergence of the proposed algorithm. As discussed in [10, §4], it is significant to obtain the quadratic convergence theorem in a neighborhood of the solution in the same manner as [10, §3]. 3. Convergence analysis In this section, we prove quadratic convergence of the proposed algorithm under some assumption that X (k) for some k is sufficiently close to a solution. The proof is similar to [10, §3]. Lemma 4 is essentially new, relevant to the algorithm design using (13) in the previous section. It is worth noting that the improvement based on (13) leads to the following well organized proof in the same manner as [10, §3]. In the following, we would like to use the spectral norm as usual, though Y (k) is the optimal matrix in terms of the Frobenius norm as Lemma 1. For this purpose, we use √ the next lemma, √ proved √ in the same manner as in [10, Lemma 2], where X (k) , X ∗ , Y (k) in its proof must be replaced with B(c ∗ )X (k) , B(c ∗ )X ∗ , B(c ∗ )Y (k) in the next lemma. Lemma 2 ([10]). For a given X (k) ∈ Rn×n , let {X ∗ } be the set of the n × n eigenvector matrices for (A(c ∗ ), B(c ∗ )) and E (k) := {E (k) | X (k) = X ∗ (I + E (k) )}.
In addition, F
(k)
(21)
is defined as in Lemma 1. Then, for any E
(k)
(k)
(k)
∈E ,
(k)
∥F ∥ ≤ 3∥E ∥.
(22)
Next, we focus on the relationship between F (k) and ˜ F (k) . From (8), (9), (10), (11), and (13), we have
∆˜F (k) := F (k) − ˜ F (k) , (k) (k) B∆ := B(c ∗ ) − B(c (k+1) ), ⎧[ [ ] ] ⎪ (k)T (k) (k) ⎨ ∆˜F (k) + ∆˜F (k) T + ∆(k) = X B X ∆ 1 ii ii (i = 1, . . . , n) [ ] [ ] ⎪ (k)T (k) (k) ⎩ Λ∗ ∆˜F (k) + ∆˜F (k) T Λ∗ + ∆(k) = X A X ∆ 2 ii ii ⎧[ ] [ ] (k) (k) ⎪ ⎨ ∆˜F (k) + ∆˜F (k) T + ∆1 = X (k)T B∆ X (k) ij ij ] [ ] [ (if i ̸ = j and λ∗i ̸ = λ∗j ) (k) (k) ⎪ = X (k)T A(k) X ⎩ Λ∗ ∆˜F (k) + ∆˜F (k) T Λ∗ + ∆2 ∆ ij ij ⎧[ ] [ ] ⎨ (Λ∗ − λ∗m I)∆˜F (k) + ∆˜F (k) T (Λ∗ − λ∗m I) + ∆(k) = X (k)T (A(k) − λ∗m B(k) )X (k) ∆ ∆ 3 ij ij ⎩[∆ (k) ] = [∆ (k) ] ˜ ˜ F F ij ji
A∆ := A(c ∗ ) − A(c (k+1) ),
(k) 2 ∥∆(k) ∥ , 1 ∥ ≤ ∥F
∥∆2(k) ∥ ≤ ∥Λ∗ ∥∥F (k) ∥2 ,
(k)
(k)
(23)
(24)
(if i ̸ = j and λ∗i = λ∗j )
(25)
(k)
∆3 = ∆2 − λ∗m ∆1 .
Recall that, in (25), m is any number such that λ∗m ̸ = λ∗i = λ∗j . For the convergence analysis, define J Suppose that J
∥J
(k) −1
(k)
(k)
as [J
(k)
(k) T ]ij = y (k) (Aj − λi Bj )y i in the same manner as line 2 in Algorithm 1. i
is nonsingular. Then, we have
n ∑ ∥∥B(c ∗ )−1 ∥√ max ∥Aj − λi Bj ∥2 ≥ 1 1≤i≤n
j=1
because
∥J
(k) −1 −1
∥
n n ∑ ∑ 2 2 T T (k) (k) √ ≤ √ min |y (k) (A − λ B )y | ≤ max |y (k) (Aj − λi Bj )y i | j i j i i i 1≤i≤n
1≤i≤n
j=1
n ∑ ∗ −1 √ ≤ ∥B(c ) ∥ max ∥Aj − λi Bj ∥2 . 1≤i≤n
j=1
j=1
6
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
Using J
(k)
, we introduce
α (k) :=
√ n∥ J
(k) −1
n ∑ √ ∗ −1 √ ∥∥B(c ) ∥ max ∥Aj − λi Bj ∥2 ≥ n. 1≤i≤n
(26)
j=1
Then, we prove the next lemma analogously to [8, §4]. Suppose that Algorithm 1 is applied to X (0) ∈ Rn×n . Moreover, for some k, suppose that
Lemma 3.
∥F (k) ∥ ≤
minλ∗ ̸=λ∗ |λ∗i − λ∗j | i
j
18n∥Λ∗ ∥(1 + α (k) )
,
(27)
where F (k) is defined as Lemma 1, and α (k) is defined as (26). Then, we obtain
∥Λ∗ ∥(1 + α (k) ) (k) 2 ∥F ∥ (i = 1, . . . , n, λ∗j ̸= λ∗i ), |[˜ F (k) ]ii − [F (k) ]ii | ≤ β (k) |λ∗i − λ∗j |
(28)
2∥Λ∗ ∥(1 + α (k) ) (k) 2 |[˜ F (k) ]ij − [F (k) ]ij | ≤ ∥F ∥ (i = 1, . . . , n, j = 1, . . . , n, λ∗i ̸= λ∗j ), β (k) |λ∗i − λ∗j |
(29)
where
β (k) := 1 − α (k) (2 + ∥F (k) ∥)∥F (k) ∥ ≥ ρ1 :=
575 648
.
(30)
Proof. First, we discuss ∥c (k+1) − c ∗ ∥ using (23). From J (k) as in (14), we obtain
√ √ (k) (k) −1 ∥ n∥Λ∗ ∥∥F (k) ∥2 . ∥c (k+1) − c ∗ ∥ ≤ ∥J (k) −1 ∥ n(∥Λ∗ ∥∥∆(k) 1 ∥ + ∥∆2 ∥) ≤ 2∥J ∑n (k) (k) (k) Concerning ∥J (k) −1 ∥, noting xi = y i + ℓ=1 [F (k) ]ℓi y ℓ from the definition in (4), we see
(31)
(k)
(k) (k) (k) T |[J ]ij − [J (k) ]ij | = |y (k) (Aj − λ∗i Bj )y i − xi T (Aj − λ∗i Bj )xi | i n n n ∑ ∑ ∑ (k) T (k) ∗ = |2y i(k) T (Aj − λ∗i Bj ) [F (k) ]ℓi y (k) + [ F ] y (A − λ B ) [F (k) ]ℓi y (k) ℓi ℓ j i j ℓ ℓ |
ℓ=1
ℓ=1
ℓ=1
≤ (2 + ∥F (k) ∥)∥F (k) ∥∥Aj − λ∗i Bj ∥∥B(c ∗ )−1 ∥ √ √ ∑n (k) (k) from ∥y i ∥ ≤ ∥B(c ∗ )−1 ∥ and ∥ ℓ=1 [F (k) ]ℓi y ℓ ∥ ≤ ∥B(c ∗ )−1 ∥∥F (k) ∥. Using the Weyl’s inequality for singular values, we have
∥J
(k) −1 −1
∥
≥ ∥J
(k) −1 −1
∥
√ −
n ∑ (k) (k) √ n(2 + ∥F ∥)∥F ∥ max ∥Aℓ − λ∗ Bℓ ∥2 ∥B(c ∗ )−1 ∥ 1≤i≤n
= ∥J
(k) −1 −1
∥
i
ℓ=1
1 − α (k) (2 + ∥F (k) ∥)∥F (k) ∥ .
(
)
We prove here that 1 − α (k) (2 + ∥F (k) ∥)∥F (k) ∥ on the right-hand side is positive. Since we assume (27), we have
∥F (k) ∥ ≤
1 n
·
1 9(1 + α (k) )
≤ min(1/36, 1/18α (k) )
(32)
for n ≥ 2 and α (k) ≥ 1. Hence, 1 − α (k) (2 + ∥F (k) ∥)∥F (k) ∥ ≥ 1 − (2 + 1/36)/18 = 575/648. Thus, we obtain (30) and
∥J (k)
−1
∥ ≤ ∥J
(k) −1
( )−1 ∥J ∥ 1 − α (k) (2 + ∥F (k) ∥)∥F (k) ∥ =
(k) −1
β (k)
∥
.
(33)
Using the inequalities (31) and (33), we estimate the off-diagonal elements of ˜ F (k) − F (k) corresponding to λ∗i ̸ = λ∗j (1 ≤ i, j ≤ n). Similarly to the calculations as in (18), we have
|[˜ F (k) ]ij − [F (k) ]ij | ≤
(k) |λ∗j ||[∆(k) 1 ]ij | + |[∆2 ]ij | + |
(k+1) ℓ=1 (cℓ ∗ ∗
∑n
|λi − λj |
(k) T − cℓ∗ )x(k) (Aℓ − λ∗j Bℓ )xj | i
(34)
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
7
from (24). In addition, it is easy to see that n n ∑ ∑ | ≤ |cℓ(k+1) − cℓ∗ |∥Aℓ − λ∗j Bℓ ∥(1 + ∥F (k) ∥)2 ∥B(c ∗ )−1 ∥ | (cℓ(k+1) − cℓ∗ )xi(k) T (Aℓ − λ∗j Bℓ )x(k) j
ℓ=1
ℓ=1
n ∑ (k) 2 (k+1) ∗ √ ≤ (1 + ∥F ∥) ∥c −c ∥ ∥Aℓ − λ∗j Bℓ ∥2 ∥B(c ∗ )−1 ∥.
(35)
ℓ=1
Therefore, we have
|[˜ F (k) ]ij − [F (k) ]ij | ≤
√∑ n
∥Aℓ − λ∗j Bℓ ∥2 ∥B(c ∗ )−1 ∥
√∑ n
∥Aℓ − λ∗j Bℓ ∥2 ∥B(c ∗ )−1 ∥)
2∥Λ∗ ∥∥F (k) ∥2 + (1 + ∥F (k) ∥)2 ∥c (k+1) − c ∗ ∥
ℓ=1
|λ∗i − λ∗j | 2∥Λ∗ ∥∥F (k) ∥2 (1 +
√
n∥J (k)
−1
∥(1 + ∥F (k) ∥)2
≤
ℓ=1
|λ∗i − λ∗j | ) 2∥Λ∗ ∥ α (k) (1 + ∥F (k) ∥)2 ≤ ∗ 1+ ∥F (k) ∥2 , |λi − λ∗j | 1 − α (k) (2 + ∥F (k) ∥)∥F (k) ∥
(
(36)
where the first inequality is due to (34) and (35), the second inequality is due to (31), and the third inequality is due to (26) and (33). From the definition of β (k) in (30), we obtain (29). (k) Finally, to prove (28), we estimate the diagonal elements of ˜ F (k) − F (k) . For any λ∗j ̸ = λ∗i , noting X (k)T A∆ X (k) −
(k) in (23), we have λ∗j X (k)T B(k) ∆ X
2|[˜ F (k) ]ii − [F (k) ]ii | ≤
(k) |λ∗j ||[∆(k) 1 ]ii | + |[∆2 ]ii | + |
(k+1) ℓ=1 (cℓ ∗ ∗
∑n
(k) T (Aℓ − λ∗j Bℓ )xi | − cℓ∗ )x(k) i
|λi − λj |
in the same manner as the calculations in (34), (35), and (36).
≤
2∥Λ∗ ∥(1 + α )
β (k) |λ∗i − λ∗j |
∥F (k) ∥2 (37)
□
The next lemma is crucial to complete the proof of the quadratic convergence. Lemma 4. Suppose that Algorithm 1 is applied to X (0) ∈ Rn×n . Moreover, for some k, suppose that (27) is satisfied, where F (k) is defined as Lemma 1, and α (k) is defined as (26). Then, for all i ̸ = j corresponding to λ∗i = λ∗j , we obtain
∥Λ∗ ∥(1 + α (k) ) (k) 2 |[˜ F (k) ]ij − [F (k) ]ij | ≤ ∥F ∥ , β (k) |λ∗i − λ∗m |
(38)
where m is any number such that λ∗m ̸ = λ∗i = λ∗j and β (k) is defined as in (30). Proof. For any λ∗m ̸ = λ∗i , using (25), we have 2|[˜ F (k) ]ij − [F (k) ]ij | ≤
(k) |λ∗m ||[∆(k) 1 ]ij | + |[∆2 ]ij | + |
∑n
(k+1)
ℓ=1 (cℓ ∗ ∗
(k) T (Aℓ − λ∗m Bℓ )xj | − cℓ∗ )x(k) i
|λi − λm |
≤
2∥Λ∗ ∥(1 + α )
β (k) |λ∗i − λ∗m |
∥F (k) ∥2
in the same manner as (37). □ As a result, we obtain the next lemma in the same manner as [10, Lemma 3]. Lemma 5. Suppose that Algorithm 1 is applied to X (0) ∈ Rn×n . Moreover, for some k, suppose that (27) is satisfied, where F (k) is defined as Lemma 1, and α (k) is defined as (26). For the computed X (k+1) , define G(k+1) such that X (k+1) = Y (k) (I + G(k+1) ).
(39)
Then, we obtain
∥˜ F (k) − F (k) ∥ ≤
( (k+1)
∥G
∥≤
2n∥Λ∗ ∥(1 + α (k) )
β (k) minλ∗i ̸=λ∗j |λ∗i − λ∗j |
2n∥Λ∗ ∥(1 + α (k) )(1 + ∥F (k) ∥)
β (k) minλ∗i ̸=λ∗j |λ∗i − λ∗j |
∥F (k+1) ∥ ≤ 3∥G(k+1) ∥, where β
(k)
∥F (k) ∥2 ,
is defined as in (30).
(40)
) + 1 ∥F (k) ∥2 ,
(41) (42)
8
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
Proof. We see (40) from Lemmas 3 and 4. Noting (39) and X (k+1) = X (k) (I − ˜ F (k) ) = Y (k) (I + F (k) )(I − ˜ F (k) ), we see G(k+1) = F (k) − ˜ F (k) + F (k) (F (k) − ˜ F (k) ) + F (k)2 .
(43)
It then follows that
∥G(k+1) ∥ ≤ (1 + ∥F (k) ∥)∥˜ F (k) − F (k) ∥ + ∥F (k) ∥2 .
(44)
Thus, we have (41). From Lemma 2, we obtain (42). □ Next, we estimate an upper bound of {α (k) }∞ proof of the next lemma is nearly the same as the proof of [10, k=0 . The √ Lemma 4]. The difference from [10, Lemma 4] is to use B(c ∗ ) appropriately. Lemma 6.
Under the same assumptions as in Lemma 5, we have
α (k+1) ≤
1−α
(k) (2
α (k) . + 4∥G(k+1) ∥) · 4∥G(k+1) ∥
(45)
In addition,
ρ2 :=
3239 20700
,
∥G(k+1) ∥ ≤ ρ2 ∥F (k) ∥.
(46)
Proof. Let Y (k+1) = Y (k) (I + ˜ G(k+1) ). Combined this with the estimation in (33) for X (k) = Y (k) (I + F (k) ), we have
∥J
(k+1) −1
∥ ≤ ∥J (k+1)
To estimate ∥˜ G X
(k+1)
(k) −1
( )−1 ∥ 1 − α (k) (2 + ∥˜ G(k+1) ∥)∥˜ G(k+1) ∥ .
(47)
∥, we use the relations in Lemma 5:
= Y (I + G(k+1) ) = Y (k+1) (I + F (k+1) ), (k)
which implies
√ √ G(k+1) ∥ = ∥ B(c ∗ )(Y (k+1) − Y (k) )∥ ≤ ∥F (k+1) ∥ + ∥G(k+1) ∥ ≤ 4∥G(k+1) ∥. ∥˜ G(k+1) ∥ = ∥ B(c ∗ )Y (k)˜ Therefore, noting (47) and the definition of α
( (k+1)
∥G
∥≤
2n∥Λ∗ ∥(1 + α (k) )∥F (k) ∥ minλ∗ ̸=λ∗ |λ∗i − λ∗j | i
·
(k)
in (26), we obtain (45). Moreover, we see (46) from
(1 + ∥F (k) ∥)
β (k)
j
(48)
) (k)
+ ∥F ∥ ∥F (k) ∥ ≤
(
1 9
·
648 575
) ) ( 1 1 + ∥F (k) ∥, · 1+ 36
36
where the first inequality is due to (41), and the second inequality is due to (27), (30), and (32), respectively. □ Now we have all the required lemmas to prove quadratic convergence in the same manner as [10]. The next lemma is proved in the same manner as [10, Lemma 5] that states an important relationship between F (k) and α (k) . Lemma 7.
Under the same assumptions as in Lemma 5, suppose that there exist k and γ (k) such that
0 ≤ γ (k) ≤ 1,
∥F (k) ∥ = γ (k) ·
minλ∗ ̸=λ∗ |λ∗i − λ∗j | i
j
18n∥Λ∗ ∥(1 + α (k) )
.
(49)
Then, we obtain
ρ3 :=
3ρ2
,
∥F (k+1) ∥ < ρ3 · γ (k) ·
ρ1 where ρ3 = 0.529 · · · < 1.
minλ∗ ̸=λ∗ |λ∗i − λ∗j | i
j
18n∥Λ∗ ∥(1 + α (k+1) )
,
(50)
Proof. Similarly to (32), we have
∥F (k) ∥ ≤ γ (k) ·
1 n
·
1 9(1 + α (k) )
From Lemma 6, we have
α (k+1) ≤
α (k) α (k) ≤ , (k) β ρ1
≤ γ (k) · min(1/36, 1/18α (k) ).
(51)
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
9
where β (k) and ρ1 are defined in (30). It then follows that 1
α
(k)
+1
1
<
ρ1
·
1
α
(k+1)
+1
,
which implies
γ (k) ·
minλ∗ ̸=λ∗ |λ∗i − λ∗j | i
∗ ∗ γ (k) minλ∗i ̸=λ∗j |λi − λj | · . ρ1 18n∥Λ∗ ∥(1 + α (k+1) )
<
j
18n∥Λ∗ ∥(1 + α (k) )
Therefore, using (42), (46), and (49), we obtain (50).
□
Using the above lemmas, we prove the quadratic convergence of the proposed algorithm as follows. The next theorem is proved in the same manner as [10, Theorem 1]. Theorem 1. Let A0 , . . . , An , B0 , . . . , Bn be real symmetric n × n matrices and λ∗1 ≤ · · · ≤ λ∗n be prescribed eigenvalues, except the case of λ∗1 = · · · = λ∗n . Suppose that Algorithm 1 is applied to X (0) ∈ Rn×n . In addition, Y (k) , F (k) (k = 0, 1, . . .) are defined as in Lemma 1. Moreover, for some k, suppose that F (k) satisfies (27), where B(c ∗ ) is positive definite, associated with the local solution. Then, we obtain
∥F (k+ℓ) ∥ ≤ (3ρ2 )ℓ ∥F (k) ∥ (ℓ = 0, 1, . . .), ) ( ∥F (k+ℓ+1) ∥ 2n∥Λ∗ ∥(1 + ρ4 α (k) ) +1 , lim sup ≤3 ∥F (k+ℓ) ∥2 minλ∗ ̸=λ∗ |λ∗i − λ∗j | ℓ→∞ i j
(52) (53)
where ρ2 is defined as in (46) and
ρ4 := exp
(
73 575
·
)
1 1 − ρ3
(= 1.309 · · ·)
for ρ3 defined as in (50). Proof. Define γ (k+ℓ) (ℓ = 1, 2, . . .) such that
∥F (k+ℓ) ∥ = γ (k+ℓ) ·
minλ∗ ̸=λ∗ |λ∗i − λ∗j | i
j
18n∥Λ∗ ∥(1 + α (k+ℓ) )
in the same manner as (49). Then, we see
γ (ℓ+k) ≤ ρ3 ℓ γ (k) ≤ ρ3 ℓ ≤ 1 (ℓ = 0, 1, . . .)
(54)
from Lemma 7. Therefore, using (27) and Lemmas 5 and 6, we obtain (52). In addition, we see
∥F (k+ℓ) ∥ ≤
ρ3 ℓ
,
α (k+ℓ) ∥F (k+ℓ) ∥ ≤
36 from (51) and (54). Hence, we have
ρ3 ℓ 18
(ℓ = 0, 1, . . .)
(55)
α (k+ℓ) 1 − α (k+ℓ) (2 + 4∥G(k+ℓ+1) ∥) · 4∥G(k+ℓ+1) ∥ α (k+ℓ) ≤ (k +ℓ ) 1−α (2 + ∥F (k+ℓ) ∥)∥F (k+ℓ) ∥ ( ) α (k+ℓ) (2 + ∥F (k+ℓ) ∥)∥F (k+ℓ) ∥ (k+ℓ) =α 1+ 1 − α (k+ℓ) (2 + ∥F (k+ℓ) ∥)∥F (k+ℓ) ∥ ( ( ) ) 1 1 648 ≤ α (k+ℓ) 1 + · 2+ · ρ3 ℓ
α (k+ℓ+1) ≤
575
36
18
where the first inequality is due to (45), the second inequality is due to 4∥G(k+ℓ+1) ∥ ≤ ∥F (k+ℓ) ∥ from (46), and the third inequality is due to (30) and (55), respectively. It then follows that
α
(k+ℓ+1)
≤α
(k+ℓ)
( exp
73 575
ρ3
ℓ
)
≤α
(k)
( exp
73 575
·
1 1 − ρ3
)
,
( exp
73 575
·
1 1 − ρ3
)
= 1.309 · · ·
for all ℓ = 0, 1, . . .. Therefore, using (41) and (42), we obtain (53). □ Remark 1. In line 9 of Algorithm 1, λ∗m can be replaced with other real numbers in a particular range, where the quadratic convergence can be proved. More precisely, for i such that λ∗i is a multiple eigenvalue, we introduce
[ ] [ ] µ ∈ λ∗1 , λ∗i− ∪ λ∗i+ , λ∗n ,
(56)
10
K. Aishima / Journal of Computational and Applied Mathematics 367 (2020) 112485
where i− , i+ are defined such that
λ∗i− < λ∗i− +1 = λ∗i ,
λ∗i = λ∗i+ −1 < λ∗i+ .
[
]
[
]
As an exceptional case, if λ∗i = λ∗1 , then µ ∈ λ∗i+ , λ∗n . Similarly, if λ∗i = λ∗n , then µ ∈ λ∗1 , λ∗i− . In line 9 of Algorithm 1, even if λ∗m is replaced with µ, the quadratic convergence is proved in the same manner as the proof in this section. For the proof, note [ that ]the above condition (56) is associated with the following two features of our proof. Firstly, we must choose µ ∈ λ∗1 , λ∗n in view of the estimation as in (35). Secondly, as in Lemma 4, |µ − λ∗i | ≥ minλ∗ ̸=λ∗ |λ∗i − λ∗j | is i j required to obtain Lemma 5. From our convergence analysis, we see that, if some of the prescribed eigenvalues are very close, ∥F (k) ∥ must be very small as in (27) for the convergence. In addition, note that the convergence rate is very slow due to such nearly multiple eigenvalues as in (53) in Theorem 1. In general, it is very difficult to compute eigenvectors for nearly multiple eigenvalues because it is an ill-conditioned problem. √ Finally, we prove that there exist limℓ→∞ Y (k+ℓ) and limℓ→∞ X (k+ℓ) . From (48), we see ∥ B(c ∗ )(Y (k+ℓ) − Y (k+ℓ+1) )∥ is (∞) quadratically convergent as ℓ → ∞. In other words, {∥Y (k+ℓ) ∥}∞ . ℓ=0 is the Cauchy sequence, and hence there exists Y In addition, there exists X (∞) from X (k) = Y (k) (I + F (k) ) and ∥F (∞) ∥ = 0. Acknowledgments The author thanks the referees for valuable comments. This study was supported by KAKENHI, Japan Grant Number 17K14143 and 17H02826. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
M. Chu, Inverse eigenvalue problems, SIAM Rev. 40 (1998) 1–39. M. Chu, G. Golub, Structured inverse eigenvalue problems, Acta Numer. 11 (2002) 1–71. M. Chu, G. Golub, Inverse Eigenvalue Problems, Theory, Algorithms and Applications, Oxford University Press, Oxford, 2005. H. Dai, P. Lancaster, Newton’s method for a generalized inverse eigenvalue problem, Numer. Linear Algebra Appl. 4 (1997) 1–21. S. Friedland, J. Nocedal, L. Overton, The formulation and analysis of numerical methods for inverse eigenvalue problems, SIAM J. Numer. Anal. 24 (1987) 634–667. H. Dai, An algorithm for symmetric generalized inverse eigenvalue problems, Linear Algebra Appl. 296 (1999) 79–98. H. Dai, Z. Bai, W. Ying, On the solvability condition and numerical algorithm for the parameterized generalized inverse eigenvalue problem, SIAM J. Matrix Anal. Appl. 36 (2015) 707–726. K. Aishima, A quadratically convergent algorithm based on matrix equations for inverse eigenvalue problems, Linear Algebra Appl. 542 (2018) 310–333. T. Ogita, K. Aishima, Iterative refinement for symmetric eigenvalue decomposition, Jpn. J. Ind. Appl. Math. 35 (2018) 1007–1035. K. Aishima, A quadratically convergent algorithm for inverse eigenvalue problems with multiple eigenvalues, Linear Algebra Appl. 549 (2018) 30–52. G.H. Golub, C.F. Van Loan, Matrix Computations, forth ed., The Johns Hopkins University Press, Baltimore, 2013.