Chapter 9
Iterative Methods of Solution Iterative or indirect method are based on reformulating Ax x(n+l)
=
Mx(n)
= b as
+c
where n refers to the iteration number, M is known as the iteration matrix and c is a new vector formed by the reformulation process. This approach requires that x(n) converges to x, the exact solution, i.e.
x
=
lim
x(n)
n->oo
Convergence depends on the choice of M and should not depend critically upon the choice of x(D). Ideally, the choice of xeD) should be based on a priori information on x but, in practice, we can set xeD) = o.
Advantages of Iterative Methods (i) The iteration matrix is virtually unaltered by most methods. (ii) The methods are generally self correcting. (iii) The methods are generally simple to program and require little additional storage. (iv) The methods lend themselves to the application of parallel processing.
Disadvantages of Iterative Methods (i) In general, there is no guaranty of convergence. (ii) Even when the method converges, the number of iterations can be large. (iii) The computing time can be large if many iteration are required and the system of equations is large. As a general rule of thumb, iterative methods are recommended only for large linear systems of equations Ax = b when A is a large sparse matrix. Systems of this type,
237
238
CHAPTER 9. ITERATIVE METHODS OF SOL UTION
frequently occur in numerical solutions to partial differential equations using finite difference analysis and (to a lesser extent) finite element analysis. They also occur in cases when the impulse response function of a system is characterized by a matrix which is banded and sparse. The types of iterative techniques that are discussed in this chapter include: Jacobi's method, the Gauss-Seidel method, the relaxation method, The conjugate gradient method which is a semi-iterative technique. The conditions for convergence (which are discussed later on in this chapter) are as follows: (i) A sufficient condition for convergence is that A is strictly diagonally dominant, i.e. n
I aii I>
L
I aij I .
j=l j#i
(ii) A necessary and sufficient condition for convergence is that
p(M) < 1 where p(M) is the 'spectral radius', given by
p(M) = max I Ai I . 2
Here, Ai are the eigenvalues of M. The largest eigenvalue of the iteration matrix must therefore be less than unity to ensure convergence.
9.1
Basic Ideas
Consider the following system of equations:
en
a12
al n
a2l
a22
a2n
anI
a n2
ann
)(
bl b2
Xl
X2
Xn
)() bn
We can rewrite this set of equations as Xl
1
=
-(bl -
a12x2 -
... -
alnX n),
a2lXl -
... -
a2nXn),
all
1
X2
= -(b2
-
a22
Xn =
1
--(bn
-
anlxl -
... - an,n-lxn-l).
ann
Clearly this requires that
aii
=I- 0, i.e. the diagonal elements are non zero.
9.1. BASIC IDEAS
9.1.1
239
Jacobi's Method
Jacobi's method involves the most basic iteration process of the type:
9.1.2
The Gauss-Seidel Method
The Gauss-Seidel method involves updating the sub-diagonal elements as the computation proceeds. The iteration process is
9.1.3
The Relaxation or Chebyshev Method
The relaxation method involves using a 'relaxation parameter' to 'tune' the system in order to reduce the number of iterations required. The iteration process is
where w is called the 'relaxation parameter'. The value of this parameter affects (increases) the rate of convergence. There are two cases in which 1 < w < 2 giving the so called 'Successive-Over-Relaxation' or SOR method and the case when 0 < w < 1 known as the 'Successive-Under-Relaxation' or SUR method.
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
240
9.2
Iterative Methods
In the previous section the basic ideas were introduced. In this section, we develop the approach further. We start with b = Ax where A is a n x n matrix and write this equation in the form n
bi =
2:
n
aijXj
=
j=1
2:
aijXj
+ aiiXi·
j=1 jf-i
Rearranging,
and iterating,
where
which is equivalent to setting x~O) = O\ii. The above iterative scheme is Jacobi's method. Gauss-Seidel iteration is given by 1
(b 2: i
(k+l) _ 1
Xi
a.
-
t>:
n
j=1
n
2:
(HI) -
aijX' J
. (k))
aiJ X· J
j=i+l
and the relaxation method is compounded in the result
It is easy to translate the formulae above into code which requires a loop to compute the sums, a loop over i and a loop over k. Note that if w = 1, then the relaxation method reduces to Gauss-Seidel iteration. For w i- 1, the relaxation method can increase the rate of convergence, depending on the characteristics of the system.
9.3
Example of the Iteration Method
Suppose we are required to solve 10Xl -
X2
+ X3
=
20,
9.3. EXAMPLE OF THE ITERATION METHOD
241
We re-write this system as
Note that the characteristic matrix A of this system is diagonally dominant and that convergence is likely because 1/aii is small. Applying Jacobi iteration, we have
x~n+l) = (16
x~n+l) and taking
xCO) =
=
+ x~n)
- 2x~n))/20,
(23 - 2x~n)
+ x~n))/20,
0, gives
x O = (0,0, of, x(1)
= (2.0,0.8, 1.15f,
x (2 ) = (1.965,1.015, 0.990f, x (3 ) = (2.0025,0.99725, 1.003f, Applying Gauss-Seidel iteration, we get
x~n+l) x~n+l)
and with x''
=
= (16 + x~n+l) _
=
(23 _ 2x~n+l)
2x~n))/20,
+ x~n+l))/20,
0, we have xCO) x(1)
= (0 " 0 O)T,
= (2.0,0.9, 0.995f,
x (2 ) = (1.995,0.999475, 1.000024f, x (3 )
=
(2.000055,1.000050, 0.999995)T.
The exact solution vector is x = (2,1, l)T.
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
242
9.4
General Formalism
Given Ax = b, we decompose the characteristic matrix A into lower triangular L, diagonal D and upper triangular U matrices, i.e,
A=L+D+U or
Characteristic Matrix = Lower triangular
+ Diagonal + Upper
triangular.
Then,
(L + D + U)x = b, Dx= -Lx-Ux+b, and where D-
1
=
dilag (-1 -1) au ,a -1 22 , ... , ann .
The iteration methods can then be written as follows:
Jacobi iteration
Gauss-Seidel iteration
or after rearranging,
Relaxation iteration
which after rearranging gives
These results are compounded in the table below.
Method Jacobi Gauss-Seidel Relaxation
Iteration Matrix M -D1(L + U) -(D + L)-1U -(D + WL)-l[(W - l)D + wU]
Vector c
D1b + L)-lb w(D + WL)-1b (D
9.5. CRITERION FOR CONVERGENCE OF ITERATIVE METHODS
243
SOR .v. SUR When do we use under (w < 1) and over (w > 1) relaxation? Experimental results suggest the following: (i) Use w > 1 when Gauss-Seidel iteration converges monotonically (most common). (ii) Use w < 1 when Gauss-Seidel iteration is oscillatory. Note that as a general rule of thumb, if A is diagonally dominant, SOR or SUR iteration is only slightly faster than Gauss-Seidel iteration. However, If A is near or not diagonally dominant, then SOR or SUR can provide a noticeable improvement.
9.5
Criterion for Convergence of Iterative Methods
There are two criterion for convergence: (i) Sufficient condition: A is strictly diagonally dominant. (ii) Necessary and sufficient condition: The iteration matrix M has eigenvalues Ai such that max I 1< 1.
x,
"
9.5.1
Diagonal Dominance
The matrix
A~
5 ( 1 2
2 6 1
-1
-3 4
is diagonally dominant because 5> 2+ 1-11, 6> 1+ I -31, 4>2+1
and the matrix
is not diagonally dominant because 1
< 2+ 1-21, 1
< 1 + 1,
1 < 2 + 2.
)
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
244
In general, A is diagonally dominant if n
I aii I>
L
I aij I Vi.
j=l
j#-i
9.5.2
Sufficient Condition for Convergence
Let us define the error ern) at each iteration (n) as
where x is the exact solution. Now, since
we have x + e(n+1)
= M(x +
ern)) + c.
If this process is appropriate to obtain a solution, then we must have x
= Mx+c.
From the equations above, we get
i.e. e(l) = Me(O), e(2)
= Me(l) = M(Me(O)) = M 2e(0),
e(3)
= Me(2) = M(M 2e(0)) = M 3e(0),
Now, for global convergence, we require that ern)
-+
0 as n
-+ 00
or lim n->CXJ
This will occur if But and hence, we require that
Mne(O) =
0 V e(O).
9.5. CRITERION FOR CONVERGENCE OF ITERATIVE METHODS
245
or
IIMII < 1. Consider Jacobi iteration, where
For the £00 norm
and therefore
n
I aii I>
I aij I .
L j=l j#i
Diagonal dominance is a sufficient but not a necessary condition for convergence. However, in general, the greater the diagonal dominance, the greater the rate of convergence.
9.5.3
Proof of the Necessary Condition for Convergence
For convergence, we require that lim ern) n-+oo
=
o.
Let M be the iteration matrix associated with a given process and consider the eigenvalue equation where Ai are the eigenvalues of M and Vi are the eigenvectors of M. (N.B. Eigenvalues and eigenvectors are discussed in the following chapter.) Let us now write the error vector e(O) as n
e(O)
=
Laivi i=l
where
a;
Hence,
since Further,
are appropriate scalars. From previous results we have
246
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
and by induction,
Now, we want e(n)
-t
0 as n
- t 00.
But this will only occur if
Af - t 0 as n
- t 00,
a condition which requires that
p(M) < max Ai 1
1
.
z
The parameter p(M) is called the 'spectral radius'. This result leads to the following recipe for establishing the convergence of an iterative process for solving Ax = b: Given x(n+l)
=
Mx(n)
+ c,
(i) find the largest eigenvalue of M; (ii) if the eigenvalue is < 1 in modulus, then the process converges; (iii) if the eigenvalue if > 1 in modulus, then the process diverges. Example Find the spectral radii for M J and Me given that
A
=
(~4 ~4 ~). o
1
-4
The iteration matrix for Jacobi's method is
1/4
0) 1/4 . 1/4 0
o
The eigenvalues of M J can be found by solving equation (see Chapter 10) 1
M J -),11= 0 for A,
-A 1/4 0 1/4 -A 1/4 0 1/4 -A or
i.e.
=0
-A ~ 1/ 4 1/4 -A 1/4 1/41_ -A 4 0 -A 1
1
Thus,
x( A2 - ~ ) 16
-
~ =0 16
1
= O.
9.5. CRITERION FOR CONVERGENCE OF ITERATIVE METHODS and
A = 0 or
247
1
±
VS·
Hence,
p(MJ )
= max I Ai 1= i
1 fO
y8
~ 0.3536.
The Gauss-Seidel iteration matrix is given by
Using Jordan's method (see Chapter 7) to find (D
+ L)-l
we get
1/4 1/16 1/64 The eigenvalues of Me are then obtained by solving
-A
o o
o 1/4 1/16 - A 1/4 =0 1/16 - A 1/64
giving
A = 0,0 or
1
8
Therefore
p(Me) = 0.125.
9.5.4
Estimating the Number of Iterations
The smaller 1A Imax, the faster e(n) ---+ 0 as n ---+ 00. Hence, the magnitude of p(M) indicates how fast the iteration processes converges. This observation can be used to estimate the number of iterations. If ti is the number of iterations required to obtain a solution vector accurate to d decimal places, then
so that n is given by d 10glOP'
n~---
-
In the SOR method, the iteration matrix is a function of the relaxation parameter w. Hence the spectral radius p is a functional of w. The optimum value of w is the one for which p is a minimum. In some simple cases, the optimum value of w can be computed analytically. However, in practice, it is common to sweep through a range of values of w (in steps of 0.1 for example) using an appropriate computer program and study the number of iterations required to generate a given solution. The relaxation parameter w is then set at the value which gives the least number of iterations required to produce a solution with a given accuracy.
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
248
9.5.5
The Stein-Rosenberg Theorem
In general, p(Mc) :S p(MJ). Therefore Gauss-Seidel iteration usually converges faster than Jacobi iteration for those cases in which both methods of iteration can be applied. The comparisons between Jacobi and Gauss-Seidel iteration are compounded in the Stein-Rosenberg theorem: If M J contains no negative elements, then there exists the following four possibilities: Convergence
(i) p(MJ) = p(Mc) = 0; (ii) 0
< p(Mc) < p(MJ) <
1.
Divergence
(iii) p(MJ) = p(Mc) = 1, (iv) 1 < p(MJ) < p(Mc).
9.6
The Conjugate Gradient Method
The conjugate gradient method can be used to solve Ax = b when A is symmetric positive definite. We start by defining the term 'conjugate'. Two vectors v and w are conjugate if v T Aw = 0, v T Av > OVv. The basic idea is to find a set of linearly independent vectors (VI, V2, ... , v-) which are conjugate with respect to A, i.e.
If we expand x as a linear series of basis vectors n X
=
Lajvj j=1
then the problem is reduced to finding ajVj. Now, the equation Ax = b becomes n
LajAvj = b. j=1 Multiplying both sides of this equation by
vT we get
n
LajvT AVj = j=1 Now, since
vTb.
vT AVj = 0 if i cI i. we have n
L ajvT AVj j=1
= aivT AVi = vTb.
249
9.6. THE CONJUGATE GRADIENT METHOD
Thus,
and the solution becomes
This solution only requires the computation of AVi and the scalar products. Therefore, the computation is easy once the conjugate directions Vi are known. In principal, the conjugate gradient method is a direct method. However, in practice, an iterative solution is required to find the conjugate directions using a 'Gram-Schmidt' type process.
9.6.1
The Gram-Schmidt Process
Our problem is, given (VI, V2, ..., v n ) of a vector space H" with inner product denoted by ( ;, find an orthonormal basis (WI, W2, ..., w n ) of R": The Gram-Schmidt process is based on the following procedure:
WI = VI, W2 = V2 W3
=
V3 -
(V2' WI; IIwII1 2 WI,
(V3' W2; (V3' WI; II w 2112 W2 - IIwII1 2 WI,
where
(x.y) == xTy,
Ilxll ==
j(x,x;.
The orthonormal basis is then given by
WI W2 ( IlwIII' Il w211' ...,
W n
)
Ilwnll '
i.e.
9.6.2 Consider Then
Example of the Gram-Schmidt Process
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
250
en Therefore
9.6.3
WI
and
(-1)
---
W2
2
0 + (1/2) 1/2 (-1/2) 1/2, ( 1o1) (-1) 1
WfW2
=0,
WfWI
= O.
0
1
are orthogonal and the orthonormal basis is
Practical Algorithm for the Conjugate Gradient Method
(1,1, . . ,If
To solve Ax = b, we begin with an arbitrary starting vector Xo = say and compute the residual ro = b - Axo. We then set Vo = ro and feed the result into the following iterative loop: (i) compute coefficients of conjugate gradients
(ii) update x,
(iii) compute the new residual
(iv) compute the next conjugate gradient using the Gram-Schmidt process
Important points 1. The conjugate directions are
Yo, VI, V2, ... , V n .
2. The matrix A is only used to compute AVi. 3. The process is stopped when
Ilrill :::; E,
i.e, the required tolerance.
4. The algorithm works well when A is large and sparse.
9.7. SUMMARY OF IMPORTANT RESULTS
251
We conclude this chapter with a schematic diagram illustrating the principal iterative methods of approach to solving Ax = b as given below.
Ax = b where A is sparse, symmetric and positive definite
Ax = b where A is sparse and symmetric or non-symmetric
1
1
p(>.) < 1
Conjugate gradient method
1
1
SOR with w = 1
x
/',. Monotonic
Oscillatory convergence
1 SOR
convergence
1 SUR
1
1
x
x
Diagram 9.1: Schematic diagram illustrating the iterative methods
9.7
Summary of Important Results
Iterative Method
Jacobi Iteration
Gauss-Seidel Iteration
Relaxation Iteration M = -(D + wL)-l[(w - l)D
+ wU],
SOR Method
1
O
IIMII«
1
c = w(D
+ WL)-lb
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
252
Sufficient Condition for Convergence n
I aii I>
L
I aij I
j=l j#i
Necessary Condition for Convergence
p(M) < max I Ai I 2
Conjugate Gradient Solution
vi 9.8
AVj =
0 if i
ie j.
Further Reading
• Traub J F, Iterative Methods for the Solution of Equations, Prentice-Hall, 1964. • Smith G D, Numerical Solution of Partial Differential Equations, Oxford University Press, 1978. • Fogeil M, The Linear Algebra Problem Solver, Research and Education Association, 1986. • Gerald C F and Weatley P 0, Applied Numerical Analysis, Addison-Wesley, 1989. • Ciarlet 0 G, Introduction to Numerical Linear Algebra and Optimization, Cambridge University Press, 1989.
9.9
Problems
9.1 (i) Solve the following system of equations using Gauss-Seidel iteration: -2XI
+ X2 =
-1,
+ X3 = 0, X2 - 2X3 + X4 = 0, X3 - 2X4 = o.
Xl -
2X2
Observe the number of iterations that are required for the process to converge, giving a solution that is correct to 2 decimal places.
9.9. PROBLEMS
253
(ii) Compare the number of iterations using Jacobi and then Gauss-Seidel method that are required to solve Ax = (2,4,6, 8)T where
o
1 4 1
1 4 1
o 9.2 Consider the following equations:
+ 6X2 - 3X3 = 4, 2Xl + X2 + 4X3 = 7.
Xl
State whether or not the characteristic matrix of this system is diagonally dominant. Compute the Jacobi and Gauss-Seidel iteration matrices for this system and solve these equations using Jacobi iteration and then Gauss-Seidel iteration. Use a 'stopping criterion' given by max
I x(n+l)
- x(n)
i"
1< ~ 2
"-
x 10- 2 .
If we wanted to improve the rate of convergence of the Gauss-Seidel process, should we use SOR or SUR. By choosing a suitable value for the relaxation parameter, solve the system of equations above using the relaxation method. Is the rate of convergence significantly improved? Comment on your result.
9.3 Compare the convergence rates of GS and SOR iteration using a relaxation parameter value of 1.6 to solve the equations:
+ 6X2 + X3 = + 6X3 + X4 =
Xl X2 3Xl
Use a 'stopping criterion' of 5 x
+
10- 3 .
X3 -
3X4
1, 1,
= 1.
Comment on the result.
9.4 For which of the following system of equations does the (i) Jacobi method and (ii) Gauss-Seidel method converge? Start with the zero vector and use a tolerance of 0.001: (a) -4Xl +X2 = Xl X2 -
1,
+ X3 = 1, 4X3 + X4 = 1, 4X2
X3 -
4X4 =
1.
254
CHAPTER 9. ITERATIVE METHODS OF SOLUTION
(b)
+ X2 + X3 = 4, X2 + 2X2 + X3 = 4, Xl + X2 + 2X3 = 4. 2Xl
(c) Xl
+ X2 + X3
=
0,
9.5 Find the spectral radii of the Jacobi and Gauss-Seidel iteration matrices corresponding to each system given in question 9.4. Hence, comment on the convergence or divergence in each case. 9.6 Which, if any, of the coefficient matrices in question 9.4 are diagonally dominant? Is diagonal dominance a necessary condition for convergence of the Jacobi and GaussSeidel method? 9.7 Ilvll denotes any norm of R" and IIAII is the associated norm for an n x n matrix A. Prove that if p(A) denotes the spectral radius of A, then
p(A) < IIAII. 9.8 Find a set of vectors which are mutually conjugate with respect to the matrix
A
=
(~ ~ ~1). -1
1
3
Hence, solve the equation
Ax = (4,6,4f. 9.9 Use the Gram-Schmidt process to solve the equation Ax = (4 4 4)T where 1
3 -1