Journal of Soun.d and Vibration (1980) 73(3), 353-362
A NUMERICAL
STUDY
OF THE SOLUTION
EIGENVALUE
OF THE PARTIAL
PROBLEM
M. PAPADRAKAKIS Institute of Structural Analysis and Aseismic Research, National Technical University. Athens, Greece (Received 5 June 1979, and in revised form 22 April 1980)
The partial eigenvalue problem that arises from the application of the finite element method is considered. A number of iterative methods are examined which consist in seeking the stationary points of the Rayleigh quotient and thus avoiding the physical assembling, of the matrices involved. The computational efficiencies of the steepest descent, the conjugate gradient and the co-ordinate overrelaxation methods are compared. Several other modifications to the original conjugate gradient algorithm as well as an orthogonalization process are studied. The dependence of the convergence of the methods on the initial estimate of the eigenvectors and on different values of the relaxation parameter.
in the case of the co-ordinate
overrelaxation,
are also examined.
1. INTRODUCTION
The partial or special [I] eigenvalue problem consists in determining one, or a few, eigenvalues A and the corresponding eigenvectors x of the generalized eigenproblem Ax=hBx,
(1)
in contrast to the complete eigenvalue problem where all the eigenvalues and the corresponding eigenvectors are required. Such eigenvalue equations are encountered in a wide class of engineering problems. The most important eigenvalue problem arising in vibration analysis of structural systems is the calculation of the lowest eigenvalues and corresponding eigenvectors of equation (I), where A is the symmetric positive definite (if all rigid body modes have been removed from the system) stiffness matrix K and B is the symmetric mass matrix M. The eigenvalues are the squares of the free vibration frequencies, and the eigenvectors represent the corresponding vibration mode shapes. In structural stability analysis the largest eigenvalue of equation (l), with A now the symmetric ge’ometrical stiffness matrix K G and B the symmetric positive definite elastic stiffness matrix KE, gives the lowest critical load, and the eigenvectors are the corresponding buckling modes. The partial eigenproblem also arises in quantum mechanics, in the investigation of the electronic structure of atoms and molecules by the configuration interaction method, where A is a very large sparse symmetric matrix and B = I. When finit’e element or finite difference spatial idealizations are used the resulting matrices A and B are sparse and usually too large for an in-core computation. In the analysis of three-dimensional solids the number of variables may reach several thousands and in highly interconnected structures the matrices may lose their distinct band form. In such cases it is preferable, from the point of view of computational effort and storage requirements, to apply iterative methods rather than transformation methods when only several eigenvalues and corresponding eigenvectors of equation (1) are required. 353 0022--460X/80/230353
+ 10 $02.00~0
‘F) 1980 Academic
Press Inc. (London)
Limited
354
M. PAPADRAKAKIS
A particular class of iterative methods, namely the gradient and relaxation methods, are based on the extremal properties of eigenvalues and are applicable only to symmetric matrices. Methods based on this idea give a sequence of vectors which best realize the maximum or minimum of the Rayleigh quotient f(x) = xTAx/xTBx, which is equal to an eigenvalue when x is the corresponding eigenvector. Such methods do not require the matrices A and B explicitly, but only the vectors Ax and Bx, and so it is possible to build up these products progressively, without assembling the overall A and B matrices. In these methods the same methodology is used and the same iterative process is performed; they differ only in the way the search directions are selected, along which the function is minimized. In the context of such a unified view, a number of gradient and relaxation methods can be examined and their computational efficiency compared. The purpose of this paper is to present a state of the art account of these methods and try to assess the advantages and shortcomings of the methods when applied to different problems and locate the areas of suitability of each method. 2. MATHEMATICAL FORMULATION DESCENT METHOD 2.1. STEEPEST The problem of finding the algebraically largest (hmBx)or smallest (A& eigenvalue is connected with the problem of maximizing or minimizing the Rayleigh quotient
f(x) = xTAx/xTBx
(2)
having the readily evaluated derivative g(x) = 2[Ax -f(x)Bx]/xTBx.
(3)
In the method of steepest descent one looks in the direction of the gradient gi to find a better approximation Xi+13to the previous local minimum xi, which minimizes the function in this direction. Let Xi+]
= Xi + (Yigi
be the expression for the new approximation. the condition df(Xi
(4)
Then the optimum values for oi are given by
+ cWigi)/acUi
0.
=
(5) The values for oi, corresponding respectively to the search for the maximum (ail) or the minimum (aiz), are given by (for typographical brevity the iteration index i is dropped) a2 =
(Yi= 2/(-ad+Jd),
-2/(ad
+ h),
tea, b)
with d =a2d2-4(&d-c),
a = gTWgTg,
c = gTBg/xTBx,
d=
b = g=Bx/x=Bx,
gTAg/gTBg-xTAx/xTBx.
(7)
A modified version of the method is obtained when a relaxation coefficient y is used in equation (4), Xi+1 =Xi
+ Yaigiv
O
The method is then called incomplete or relaxed steepest descent.
(8)
THE
2.2.
CONJUGATE
PARTIAL
GRADIENT
EIGENVALUE
355
PROBLEM
METHOD
In the conjugate gradient method the next approximation sought along the direction pi such that xi+t
=xi
to the eigenvector
x,+ I is
(9)
+aipIj
with ai being fixed by the condition that f(xi+l), attains a minimum The search direction itself is generated recursively by pi=gi+Pt-lpi-
or a maximum
along p,. IlO)
I,
where Pi-1
=gTgi/gT-lgt
(11)
1.
Explicit formulae for ai can be generated as in the case of steepest descent, by substituting the values of Xi and pi, obtained from equations (9) and (lo), into the equation 8f,fcxl+ tuipi)/aai = 0. The values for (Y~are given by equations (6), and the coefficients of equations (7) now take the forms b = p=Bx/x=Bx,
a = pTBp/gTg,
c = pTBp/xTBx,
d = p=Ap/p=Bp - x=Ax/x=Bx.
(12)
A different evaluation for the scalar parameter pi has been derived by Fried [2], by minimizing f(x) not only with respect to ai but also with respect to pi. An explicit expression can be obtained from the minimization of the Rayleigh quotient with respect to pi. The change: of 6f(xi) in f(xi) due to the change of xi into xi+], according to equation (9). is given by (with the i index dropped) Sf(x)=(gTg/xTBx)[(ada2+2a)/(l+2ba+ca2)]. Substituting
cx2 from equation
(6b) into equation
(13) gives
Sf(x) = (g=g/x=Bx)/[a/( The best change
in the function
with respect
(131
1 + ha)].
to pi is obtained
(14) from (151
dSf(x, )l@, = 0, which yields the following search:
value for the scalar parameter
Pi = -[pT(A-fi+tB)gi+, With (Xi+1 -X,)TBxi neglected,
/3, corresponding
-g~+lg~+~pTBxi+~/x~~Bxi+~]/pT(A-fi+~B)pi. equation
(16) reduces
to the minimal
(16)
to
pi =-p7(A_fi+lB)gi+,/p7(A-~+,B)pi.
(17)
Substitution of (Ye from equation (6a) into equation (13) does not give an explicit expression for pi as in equation (16), but after neglecting some terms pi may be approximated by equation (17) for the maximum eigenvalue. In reference [3], another expression was proposed for the parameter pi. This expression was obtained from the H-orthogonality condition pT+iHpi = 0, with H, representing the Hessian matrix of the local second order derivatives of the function, and is given by pi = -g’Hpi/p’Hpi, which may be written as Bi = -Eg’i’,~(A-fi+~B)pi-g~+~gi+~x~+~Bpi]/pT(A-f,+~B)pi.
(18)
356 2.3.
M. PAPADRAKAKIS CO-ORDINA-I-E
RELAXAl‘ION
ME-I’HOI)
The concept of co-ordinate relaxation was presented by Faddeev and Faddeeva [ I] and has been used by several authors recently [4-81. In this method, which is also called the method of alternate directions or optimal relaxations, the search direction w is taken successively as the different base vectors e,,,, with m = 1, 2, , II, where n is the order of A. The new estimate x, +, is now given by X IT 1 =
X, + CyiW,,
w,
=
(0,.
. .
1 . . O)“..
t 19a, br
The vectors wi are chosen cyclically, and the sequence of n single steps constitutes one iteration. The optimum values for a, are obtained by the same process as before and the resultant quadratic equation is given by a**+&
+c =O,
(20)
where a = wTAwxTBw - xTAwwTBw,
b = w”‘AwxTBx - xrAxw rBw,
c = x-~Awx~Bx - xTAxxTBw. The expressions
(21) take on particularly
121)
simple forms when w is substituted
by equation
(I9b) b =a,,,xTBx-x”Axb,,,,,
a = ammum- umbmm,
c = ~,,,x~Bx-x~‘Ax~,,,
(22)
where a,,,,,, and b,, are the mth diagonal terms of A and B, and U, and u,” are the mth terms of Ax and Bx. A slightly different approach may be accomplished if one is looking for the extremum of the Rayleigh quotient in the family of vectors (2.3)
xi+1 =c1x, fCzW,, which eventually compute
leads to the following
f from the quadratic (xTBxb,,
algorithm:
equation
- u $ )f* - (h,,
xTAx + a ,,,m~TB~-2u,~~~,,,)f+a,,xTAx-~4~,,
= 0;
then xi+1 =x; faw;,
cy = CZIC, = (fc,,, - u,)l(a,,,,
-fb,,,,,,).
(24)
Schwarz [7] has noticed the similarity between this method and the successive overrelaxation method for solving a system of linear equations and has shown that when a relaxation coefficient is used, so that equation (19) is replaced by X,+1
=
xi
+ oa;w,,
t251
the asymptotic convergence of the process is guaranteed for all relaxation factors 0 < w < 2. He called this method co-ordinate overrelaxation (for w > 1). However the optimal choice of w can be discussed theoretically only in the case that A and B have “property A”. 2.4.
CONVERGENCE
AND
INITIAL
APPROXIMATION
Bradbury and Fletcher [9] proposed a method of reducing the number of variables by one in order to utilize to the full the property of the quadratic convergence of the minimization methods when applied to the homogeneous form of the Rayleigh quotient.
THE
PARTIAL
EIGFNVALIJE
PRORI EM
is7
The vectors x, are restricted to lie on a convex surface containing the origin of a n-dimensional space, typically the “unit sphere” of any norm of x,. The simplest way of doing the transformation is to use the maximum norm (max,,,/x,,, / = l), where the convex surface becomes an intersection of planes producing a hypercube having 2n faces with variables directly related to the elements of x,. It is important to find a good starting value in order to avoid wasting unnecessary time in trying to find the appropriate face of the hypercube upon which to minimize the function. A convenient initial starting point is one of the unit vectlors e,. Then the Rayleigh quotient is given by
f(e,) = e~~,Ae,,,/e,',,Be,,,= (lr,,,,lb,,,,,,. The gradient
vector
l7h!
is given by
de,) = W,,, --f(e,,h,
!lhd
127,
Hence the e,, which where a,,, and b,,, are the mth column vectors of A and B respectively. is chosen as a starting point, is that which corresponds to the minimum or maximum (depending on the search for the minimum or maximum eigenvalue) ratio of the diagonal elements, a,,,,/h,,,. When using this procedure, the elements of x, are examined at each iteration and normalized by the appropriate scalar which reduces the element of the largest absolute value fo unity. However, the procedure described by Geradin [3] avoids the need of this artificial normalization because of the explicit implementation of the Hessian matrix. with the conjugate directions building up without referring to the length of the vectors x,. The products Axi+, and Bxitl may be calculated recursively, instead of carrying out the multiplications at each iteration, from the relationships Ax ,+I=A~,+(Y,A~~,
Bx,+I -Bx,+Q,Bs~,
(781
where s, is the direction along which the function is minimized in each case. It is advisable however to compute these products directly every so often. so as to prevent the domination of rounding-off errors, particularly when dealing with ill-conditioning problems.
3. NUMERICAL
TESTS
The efficiencies of the aforementioned methods were compared in three test cases. Since in all methods one follows the same methodology for solving the general eigenvalue problem it can be assumed, without loss of generality, that matrix B is the unit matrix. This simplification d.oes not affect the comparative efficiencies of the methods since the effort for evaluating the products Bxi is proportional to the final computer time of each method. In this comparative study only the two extreme eigenvalues of matrix A have been evaluated. When more than one of the eigenvalues is required, then a number of similar methods can be applied to both main categories of methods discussed: for example, the method of deflation and the gradient projection method as successive methods, and the s-step gradient or the group co-ordinate overrelaxation for a simultaneous evaluation of a number of eigenvalues and their corresponding eigenvectors. The evaluation of &,, in addition to that of Amin, provides some useful information about the performance of the methods when the eigenvalue spectrum is sparsely populated in the area of the eigenvalue that one seeks to evaluate. Comparisons have been made in calculating the extreme eigenvalues of the stiffness matrices of two cable structures and a RC frame. The test problems were selected so as to produce matrices A with different degrees of freedom, condition numbers and sparsities. E,xample 1 has 36 degrees of freedom (DOF) and the corresponding condition number of
358
M. PAPADRAKAKIS
A is p = 7860; example 2 has 75 DOF with p = 55 and example p = 5660. The matrices were diagonally scaled. Iterations were continued until the error tolerance criterion
3 has 90 DOF
and
(29)
IA, -k+illl~,lQ
was satisfied, where A; is the value of the eigenvalue during the ith iteration. All methods used are listed in Table 1. The letter “B” after the abbreviated name of the method in subsequent tables means that the Bradbury and Fletcher orthogonalization process is applied in each iteration. Since the number of iterations is not a measure of the relative efficiency of the methods, due to different calculations performed in each method inside an iterative loop, the computer time was also incorporated into the results as a more reliable measure of comparison. For this reason care was given for all computer programs to be constructed in a similar way. Tables 2-6 show the number of iterations and the computer time in seconds required to meet different values of the error tolerence E. For these results one can see that the EIGCG and EIGGER methods gave very similar results. The Bradbury and Fletcher orthogonalization technique improved the convergence of EIGFR2 for smaller values of E, but this did not happen with EIGCG.
TABLE
1
Methods used and their abbreviated names Abbreviated name
Description
of the method
EIGSD EIGCG EIGFR 1 EIGFR2 EIGGER
Steepest descent method Conjugate gradient method Conjugate gradient method with equation (17) for p Conjugate gradient method with equation (16) for p Conjugate gradient method with equation (18) for @
EIGCR
Co-ordinate overrelaxation method
TABLE
Convergence Termination parameter E Method EIGSD (y = 0.6) EIGCG EIGCG-B EIGFRl EIGFR2 EIGFR2-B EIGGER EIGCR (w = 1.0) EIGCR(w = 1.9)
O.lE-00
2
to Amin (example
O.lE-02
1)
O.lE-04
N iter
Time
N,,,,
Time
N,,,,
Time
5 10 2 4 6 6
0.012 0.016 0.016 0.005 0.014 0.010 0.017 0.030 0.017
33 13 4 95 28 58 17 48 21
0.074 0.020 0.032 0.127 0.070 0.096 0.029 0.074 0.032
557 165 1302 13455 151 109 146 4367 746
1.257 0.254 9.995 17.990 0.372 0.180 0.25 1 6.707 1.145
:: 11
0.1 E-06 Nit,, 1070 205 1442 22032 526 144 177 16854 1179
Time 2.146 0.319 11.429 29.500 1.296 0.239 0.305 25.880 1.810
THE
PARTIAL
EIGENVALUE TABLE
359
PROBI,EM
3
Convergence to hmin (example 2) Termination parameter E
O.lE-06
O. lE-04
O.lE-02
O.lE-00 l--r
I
Method
Ni,,,
Time
Ni,,r
Time
Ni,,,
Time
N,,,,
Time -~
EIGSD (y = 0.6) EIGCG EIGCG-B EIGFRl EIGFR2 EIGFR2-B EIGGER EIGCR(o=l.O) EIGCR (o = 1.9)
5 7 7 5 6 8 7 5 2
0.042 0.039 0.046 0.028 0.033 0.053 0.039 0.027 0.011
34 18 15 55 19 16 18 28 40
0.289 0.101 0.099 0.401 0.107 0,106 O*lOl 0.149 0.213
53 26 26 125 26 27 26 52 65
0.450 0.146 0.171 0.701 0.147 0.176 0.146 0.277 0.347
94 67 33 203 61 34 63 1820 738
0.798 0.376 0.217 1.138 0.349 0.224 0.352 9.707 3,936
TABLE
Termination parameter E
4
to A,,,
Convergence
O.lE-00
(example
O.lE-02
1) O.lE-04
0.1 E-06
x--r Method
N ,ter
Time
Nit,,
Time
N,,,,
Time
N,,,,
Time
EIGSD (y = 0.6) EIGCG EIGCG-B EIGFRl EIGFRl-B EIGGER EIGCR (w = 1.0) EIGCR (o = 1.90)
5 4 2 4 2 2 2 2
0.012 0.006 0.016 0.007 0.003 0.003 0.003 0.003
13 9 4 13 8 10 8 9
0.029 0.014 0,032 0.022 0.013 0.017 0.012 0.013
21 14 h 32 4.5 15 54 23
0.047 0.022 0.048 0.052 0.075 0.026 0.083 0.035
33 23 136 55 81
0.075 0.036 1.078 0.090 0.134 0.043 (I.181 0~055
TABLE
Convergence Termination parameter F
O.lE-00
25 118 36
5
to A,,,
(example 2)
O*lE-02
O.lE-04
0.1 E-06
Y-W, Method
N ,ter
Time
Nit,,
Time
N,,,,
Time
N,,,,
Time
EIGSD (y = 0.6) EIGCG EIGCG-B EIGFRl EIGFRl-B EIGGER EIGCR (w = 1.00) EIGCR (w = 1.90)
4 3 2 3 2 2 3 2
0.035 0.017 0.013 0.018 0.019 0.011 0.016 0.011
11 7 5 10 8 9 5 9
0.096 0.039 0.033 0.056 0.075 0.050 0.027 0.048
20 11 13 19 20 12 38 20
0.174 0.062 0.086 0.107 0.241 0.067 0.203 0.107
981 32 33 981 335 33 512 112
8.538 0.179 0.217 6.45 1 3.151 0.185 2.730 0.597
360
M. PAPAI>RAKAKIS GABLE
Convergence O.lE-00 __h___N ,ter Time
Termination parameter F EIGCG
hmin
Amax Am’” Amax
EIGGER
EIGCR (w= 1.9)
to hrnin
Lin
Ama);
6
and A,,,
(example 3) 0, lE-06
O.lE-04 ,_A-_
0.1 E-02 (Pk.---N,,,,
Time
N,,,,
_ _--,--_--
Time ..____
N,,,,
Time
2
0.052 0.021
49 17
0.513 0.178
63 35
0.659 0.366
87 45
0 9 10 0.47 I
5 2
0.060 0.020
50 18
0.579 0.208
64 35
0.741 0.405
88 49
1.019 0.568
23 2
0.242 0.021
39 10
0.412 0.106
52 21
0.549 0.222
79 33
0.834 0,348
5
_
In Figures l(a) and (b) the total number of iterations is plotted as a function of the relaxation parameter w for the EIGCR applied to examples 1 and 3, respectively. An optimal or a near optimal relaxation parameter can dramatically improve the convergence of the method. The two versions presented in equations (19), (20) and (24) produced almost identical results in all cases considered. In all aforementioned studies the initial estimate of the vector x was taken as the unit vector (x0=(1.. . l)=). Different starting values, ranging from x0 = (10 . . . lo)= to x0 = (104.. . 104)=, were tried in order to study the degree of dependence of the convergence rates and the accuracy of the final results on the initial starting value. Except for those of EIGFRl, the results of all other methods remained unaffected by these changes of the initial vector. For EIGFRl there was slower convergence only for the Aminof example 1 and also convergence was to a different minimum from the true one. Two more convergence studies are reported in Tables 7 and 8.
2-4 -
2.2 -
3-2 -
zo(bl I
Figure
1. Variation
of N,,,, with W. (a) Example A,&E = 0.1506).
1; (b) example
3.
361
THE PARTIAL EIGENVALU~: PROBLEM TABLE 7
,Number of iterations required for convergence up to nq significant decimal digits to the true minimum eigenvalue Example 1 r n4 = 2
Example 2
n,=4
nq = 6
fly = 2
n,=4
II,,= 6
557 165
1070 191
10509 248
34 13
473 -
2751 _-..
EIGCG-B EIGFRl EIGFR2 EIGFR2-B
1302 13455 104 96
1372 22032 247 120
1480 34749 1092 170
15 55 15 14
67
67
61 -
128 -
EIGGER EIGCR (w = l-0)
146 14876
156 16854
177 20806
15 28
63 11108
100
Method EIGSD ( y = 04) EIGCG
TABLE 8 Number of iterations required for convergence to the minimum eigenvalue of example 1, when using the gradient g, and the modified gradient r = Ax-f (x)Bx Termination parameter E
0.1E-00
O. lE-02
O.lE-04
0.1 E-06
EIGSD (r=l.O)
r g
5 5
96 96
13460 13460
EIGCG
r g
10 10
13 13
255 165
314 205
EIGGER
r g
6 10
22 17
333 146
461 177
4. CONCLUDING
-
REMARKS
On the basis of the results presented, and on other experiments not detailed here, it appears that no method is best for all problems. The computer time required for the solution of the partial eigenvalue problem depends on the distances between adjacent eigenvalues, the condition number and the order of the matrix A. It is possible, however, to indicate the degree of applicability of the methods for each case. The conjugate gradient type of methods performed better where there is an accumulation of eigenvalues around the extreme eigenvalue. On the other hand the co-ordinate overrelaxation, with a proper estimate of the relaxation parameter, may be proved very efficient for large sparse matrices with a sparsely populated eigenvalue spectrum. The modifications proposed by Fried in the evaluation of the scalar parameter p did not improve the convergence rate of the original EIGCG. The orthogonalization process gave contradictory results by improving substantially the EIGFR2 version and by deteriorating
362
M.
the convergence of EIGCG. results to those of EIGCG.
PAPADRAKAKIS
The method
proposed
by Geradin
(EIGGER)
gave very close
REFERENCES W. H. 1. D. K. FADDEEV and V. N. FADDEEVA 1963 ComputationalMethodsofLinearAlgebra. Freeman and Co. 2. I. FRIED 1972 Journal of Sound and Vibration 20, 333-342. Optimal gradient minimization scheme for finite element eigenproblems. 3. M. GERADIN 1971Journal ofSound and Vibration 19,319-33 1. The computationalefficiency of a new minimization algorithm for eigenvalue analysis. Coordinate overrelaxation methods 4. S. HOLM 1973 Umea University, Report UNINF-33-73. for the eigenproblem. 5. I. SHAVITT, C. F. BENDER, A. PIPANO and R. P. HOSTENY 1973 Journal of Computational Physics 11,90-108. The iterative calculation of several of the lowest or highest eigenvalues and corresponding eigenvectors of very large symmetric matrices. 6. H. R. SCHWARZ 1974 ComputerMethods in AppliedMechanics and Engineering 3, 1 l-28. The eigenvalue problem (A - hB)x = 0 for symmetric matrices of high order. 7. H. R. SCHWARZ 1974 Numerische Mathematik 23, 135-151. The method of coordinate overrelaxation for (A - AB)x = 0. 8. H. R. SCHWARZ 1977 ComputerMethods in AppliedMechanics and Engineering 12,181-199. Two algorithms for treating Ax = ABx. 9. W. W. BRADBURY and R. FLETCHER 1966 Numerische Mathematik 9, 259-267. New iterativemethods for solution of the eigenproblem. 10. M. PAPADRAKAKIS 1978 Ph.D. Thesis, The City University, London. Gradient
11. 12.
13. 14. 15. 16.
and relaxation nonlinear techniques for the analysis of cable supported structures. L. FOX 1965 Introduction to Numerical Analysis. Oxford University Press. M. R. HESTENES and W. KARUSH 1951 Journal of Research of the National Bureau of Standards 47, 45-61. A method of gradients for the calculation of the characteristic roots and vectors of a real symmetric matrix. I. FRIED 1969 American Institute of Aeronautics and Astronautics Journal 4, 739-741. Gradient methods for finite element eigenproblems. A. JENNINGS 1977 Matrix Computation for Engineers and Scientists. Chichester, New York, Brisbane, Toronto: John Wiley. J. H. WILKINSON and C. REINSCH 1971 Handbook for Automatic Computation, Volume II Linear Algebra. Berlin, Heidelberg, New York: Springer-Verlag. B. T. SMITH, J. M. BOYLE, B. S. GARBOW, Y. IKEBE, V. C. KLEMA and C. B. MOLER 1976 Matrix Eigensystems Routines-EISPACK Guide. Berlin, Heidelberg, New York: SpringerVerlag.