NORTIR-l t ( ~ I A N D
Minimal Residual S m o o t h i n g in Multi-Level Iterative M e t h o d Jun Zhang
Department of Mathematics The George Washington University Washington, DC 20052
ABSTRACT A minimal residual smoothing (MRS) technique is employed to accelerate the convergence of the multi-level iterative method by smoothing the residuals of the original iterative sequence. The sequence with smoothed residuals is re-introduced into the multi-level iterative process. The new sequence generated by this acceleration procedure converges much faster than both the sequence generated by the original multi-level method and the sequence generated by MRS technique. The cost of this acceleration scheme is independent of the original operator and in many cases is negligible. The emphasis of this paper is on the practical implementation of MRS acceleration techniques in the multi-level method. The discussions are focused on the two-level method because the acceleration scheme is only applied on the finest level of tile multi-level method. Numerical experiments using the proposed MRS acceleration scheme to accelerate both the two-level and multi-level methods are conducted to show the efficiency and the cost-effectiveness of this acceleration scheme. © Elsevier Science Inc., 1997
1.
INTRODUCTION
In this paper, we concern ourself with t h e acceleration scheme for t h e multi-level i t e r a t i v e solution of t h e linear s y s t e m
A~u h = f t .
(1)
In m a n y p r o b l e m s of p r a c t i c a l interests, s y s t e m (1) results from discretized p a r t i a l differential equations. In this case, t h e s u p e r s c r i p t h indicates t h e uniform mesh-size a s s o c i a t e d w i t h t h e grid space ~ h.
APPLIED MATHEMATICS AND COMPUTATION 84:1-25 (1997) © Elsevier Science Inc., 1997 655 Avenue of the Americas, New York, NY 10010
0096-3003/97/$17.00 PII S0096-3003(96)00050-1
2
J. ZHANG
In practical applications, A h is usually very large and sparse but not necessarily symmetric positive definite. For example, if system (1) is obtained from discretized convection-diffusion equation with high-Reynolds number, A h is nonsymmetric and indefinite. Unlike in direct methods, in many iterative methods, A h is not stored. Hence, if A h (with variable entries) is evaluated in the iteration process, we say that A h is complicated. The iterative solution of (1) is of great interest in scientific computing because direct methods usually can not handle such a large system. Classical relaxation methods (e.g., Jacobi, Gauss-Seidel, and SOR) for solving system (1) begin with an initial guess at a solution and quickly damp (smooth) the components of high frequency errors with short wavelength comparable to the mesh-size but leave the components of low frequency errors with long wavelength almost unchanged. Hence, classical iterative methods work very well for the first several iterations. Inevitably, however, the convergence slows and the entire iterative scheme appears to stall. One effective technique to remove the low frequency errors is to project them to a coarser grid, on which the errors become more oscillatory and thus are more subject to relaxation methods. This leads to the two-level method. The idea of treating smooth errors on the coarse grid can be recursively generalized to multi-level or multigrid method in which the coarse level subproblem is not solved exactly on the coarse grid, but high frequency errors are smoothed out and the left low frequency errors are projected to yet a coarser grid to be smoothed out there. Standard multi-level method is extremely efficient for solving elliptic problems such as the Poisson-type equations [1], but convergence is slowed down when it is used to solve nonelliptic problems or problems containing nonelliptic components, such as steady state incompressible flow problems with high-Reynolds number [2]. In these cases, acceleration schemes are usually needed to obtain satisfactory convergence rate. Various acceleration schemes have been proposed by many investigators to accelerate different procedures of the multi-level method in different situations [3-6]. In this paper, we investigate the feasibility and the efficiency of employing some minimal residual smoothing techniques to accelerate the convergence of the multi-level method (MLM). For reasons which will become clear later, the acceleration scheme discussed in this paper is applied on the finest level only. Hence, our discussion will be focused on the two-level method, because it contains all the basic ideas of the multi-level (multigrid) method and because the multi-level method can be viewed as using recursively defined two-level method to solve the coarse grid sub-problem. For the two-level method defined below, we have a coarse level grid space H with mesh-size H. Usually, H is chosen as 2 h for convenience, but this is not necessary. First, we formally formulate the two-level method (TLM) as follows.
Minimal Residual Smoothing
ALGORITHM 1.
3
Two-level method (TLM)
Relax v 1 times on A h u h = f h with a given initial guess u h. Compute r h = f h _ A h u h. Restrict r H = R r h. Solve e H = ( A H ) - l r H Correct u h = u h + P e H. Relax v 2 times on A h u h = f h with the initial guess u h. In Algorithm 1, R is the projection (restriction) operator which projects the residuals from the fine grid 1~h to the coarse grid 1~ H. p is the interpolation (prolongation) operator which interpolates the coarse-grid-correction back to the find grid. For more details on the inter-grid transfer operators, readers are referred to [1, 7]. In our following discussions, we use superscripts to indicate the grid levels and subscripts to indicate the iterations. However, when there is no possibility of confusion or there is no need to indicate them explicitly, we drop the superscripts or subscripts for brevity. Suppose that a sequence {%} is generated by some iterative method with associated residual sequence {rk}. Since it is generally not possible to measure the convergence of the error directly, the quality of the iteration is usually judged by the behavior of the residual norm sequence {[[rk[[}, where [[" [[ is some norm, e.g., the Euclidean norm. Usually, it is desirable that {[[~[[} converges "smoothly" to zero. In many classical iterative methods, residuals are not effectively utilized in the iteration process; they are usually used to measure the convergence only. Let us take a detour to review another category of iterative methods which utilize residual techniques extensively. In the widely used generalized minimal residual (GMRES) method [8], each u k is characterized by
[[ f -
Auk[J2 =
min
IIf - AuH2,
u ~ Uo+,~k(v o, A)
where [1" 112 is the Euclidean norm and the Krylov subspace ~(k(r0, A) is defined by
~7.Ck(ro, A) = span{ro, A r o , . . . ,
Ak-lro}.
For GMRES, {HrkH2} converges to zero optimally among all Krylov subspace methods, for which u k ~ u 0 + J~dk(r0, A). Other comparable methods, such as biconjugate gradient (BCG) [9, 10], and conjugate gradient squared
4
J. ZHANG
(CGS) [11], have certain advantages over GMRES, but often exhibit very irregular residual-norm behavior [12]. This irregular residual-norm behavior has provided an incentive for the development of methods that have similar advantages but produce better behaved residual norms, such as the biconjugate gradient stabilized (Bi-CGSTAB) methods [13, 14] and methods based on the quasi-minimal residual (QMR) approach [15-18]. Another approach to generating well-behaved residual norms has been proposed by Sch5nauer in [19] and investigated extensively by Weiss in [20]. In this approach, an auxiliary sequence {vk} is generated from {u k} by a relation
v0 = u0, vk = (1 - flk) Vk_l + flkUk,
k = 1,2,...,
in which each flk is chosen to minimize [ I f - A[(1
-
(2)
~)vk_ 1 Jr flUk][[2 over
fl ~ R, i.e., sT_l( rk -- Sk_l) ~k =
[[r~ -- sk_1[[~
'
(3)
where s k_ 1 = f - Avk-1. The resulting residuals s k obviously have monotonically decreasing Euclidean norms, i.e., [1Ski[2 <~ [[S~_ 1112 and [1Ski[2 <~ [1rk[[2 for each k. Weiss [20] explored and analyzed the residual smoothing technique of the form (2) extensively, which was referred to by Zhou and Walker [12] as the minimal residual smoothing (MRS) technique. Weiss showed that applying MRS to an orthogonal residual method results in a minimal residual method. Zhou and Walker extended MRS to a quasi-minimal residual smoothing (QMRS) technique applicable to any iterative method [12, 21]. They also showed that QMRS can be used to derive a QMS-type method from any given method. In their numerical experiments, it was found that MRS residual norms were often, although not always, slightly smaller than the QMRS residual norms and, in some cases, tended to remain a little more stable in the final iterations. They have some preference for MRS over QMRS for general use. In this paper, we choose MRS to smooth the residuals generated by the multi-level method (MLM). In our numerical experiments using MRS and QMRS to smooth and to accelerate the MLM sequence, the MRS sequence is better behaved than QMRS sequence when both are used as smoothing techniques. However, when both are used as
Minimal Residual Smoothing
5
acceleration techniques, MRS is far better than QMRS. The later is actually inefficient in our experiments when it is used as an acceleration method for MLM. Hence, discussion of applying QMRS to the multi-level method will not be pursued here. More general forms of residual smoothing are considered by Brezinski and Redivo-Zaglia [22], but we will limit our attention to MRS of the form (2) in this paper. Most existing researches have been focused on employing MRS techniques as smoothing methods to smooth the residuals generated by the conjugate gradient (CG)-type methods. Although some numerical experiments are reported to show that sequence generated by classical iterative methods such Jacobi and Gauss-Seidel methods can also be smoothed [22], we are not aware of discussion on the practical implementation of employing MRS techniques to accelerate the classical iterative methods. The major reason is probably that residuals are not necessarily calculated in the classical iterative methods, and the employment of MRS techniques requires residual computation. This may render the cost of implementing MRS techniques prohibitive in the classical iterative methods. However, in the two-level (multi-level) method, residuals are computed automatically and used to form the subproblem on the coarse level. This advantage certainly reduces the cost of implementing MRS techniques. Furthermore, since the subproblem on the coarse-grid is a residual equation and the smoothness of the residuals is essential for the solution of the coarse-grid subproblem to approximate that of the fine grid problem so that a good coarse-grid-correction may be provided to the fine grid [1]. This gives the primary incentive to use MRS techniques to smooth the residuals before they are projected to the coarse grid and the reason that this approach will accelerate the convergence of the original two-level (multi-level) iterative process. This paper emphasizes the practical implementation of employing MRS in the two-level (multi-level) method to accelerate the convergence of the original method. Rigorous analysis is not included and will be reported elsewhere. In Section 2, we elaborate briefly on MRS techniques and design algorithms to employ MRS as smoothing techniques in the two-level iteration process. In Section 3, we develop MRS acceleration scheme and algorithms that feed the sequence with "smoothed" residuals generated by MRS back to the two-level and multi-level iteration processes. Numerical tests are employed in Section 4 to show the remarkable acceleration rate and negligible cost of the MRS acceleration scheme used in the two-level and multi-level methods. Conclusions are included in Section 5. Suggestions on future research are given in Section 6.
6
J. ZHANG
2.
MINIMAL RESIDUAL SMOOTHING
2.1.
MRS Techniques
A s s u m e t h a t we h a v e some i t e r a t i v e m e t h o d which g e n e r a t e s a sequence of i t e r a t e s { u k} w i t h t h e c o r r e s p o n d i n g residual sequence { rk}, we formulate t h e M R S t e c h n i q u e of [19, 20] as follows.
ALGORITHM 2.
M i n i m a l residual s m o o t h i n g ( M R S ) [19, 20]
Initialize s o = r 0 a n d v0 = u o. F o r k - - 1,2 . . . . , do: C o m p u t e u k a n d r k. C o m p u t e flk - - sT- l(rk - sk- 1)/ll rk - sk- 111~. Set s k = Sk_ 1 -~" [3k(r k - S k _ l ) a n d v k = vk_ 1 + [3k(u k - v~_l). Zhou a n d W a l k e r [12] p o i n t e d o u t t h a t t h e r e is a p o t e n t i a l n u m e r i c a l difficulty with A l g o r i t h m 2. In practice, when t h e p e r f o r m a n c e of an i t e r a t i v e m e t h o d has become d e g r a d e d t h r o u g h n u m e r i c a l error, t h e comp u t e d value of r k can differ significantly from f - A u k. W h e n this h a p p e n s , the value of s k c o m p u t e d from A l g o r i t h m 2 can differ significantly from f - A v k. T h e y gave a m a t h e m a t i c a l l y equivalent v a r i a t i o n t h a t does n o t suffer this difficulty p r o v i d e d an a c c u r a t e value of A p k is available for each Pk = uk - uk-1, which is v e r y often t h e case in t h e C G - t y p e m e t h o d s . In A l g o r i t h m 3, b o t h s k and vk are d e t e r m i n e d d i r e c t l y from Pk a n d Apk. In p a r t i c u l a r , r k is not involved in t h e c o m p u t a t i o n at all. T h e i n t e r m e d i a t e quantities x k a n d Yk are m a i n t a i n e d so t h a t , after t h e u p d a t i n g in t h e final step, we have r k = s k - x k a n d u k = vk + Yk. A c o m p a r a t i v e s t u d y on p r a c t i c a l behavior of A l g o r i t h m s 2 and 3 was i l l u s t r a t e d n u m e r i c a l l y in [12] which shows t h a t in a d e l i b e r a t e l y designed n u m e r i c a l e x p e r i m e n t , Algor i t h m 3 converges while A l g o r i t h m 2 diverges.
ALGORITHM 3.
M i n i m a l residual s m o o t h i n g ( M R S ) [12].
Initialize s o = r0, v0 = u 0 a n d x 0 = Y0 = 0. F o r k = 1 , 2 , . . . , do: C o m p u t e Pk = u k - Uk-1 and Apk. Set x k = x k_l + Apk and Yk = Yk-1 + pk" C o m p u t e [3k = s T_ 1 x k / x r x k • Set s k = s k _ l - f l k x k a n d vk = vk_l + ¢ l k y k . U p d a t e x k = (1 - flk)Xk a n d Yk = (1 -- flk) Yk"
Minimal Residual Smoothing
7
Algorithm 3 is much more expensive than Algorithm 2 to implement in TLM. The computational cost of Apk is equivalent to one relaxation on the fine grid and is not required in TLM. This cost is potentially high if A is complicated. For example, if A results from discretizing the convectiondiffusion equation with variable coefficients and its entries are not precomputed and stored, the evaluation of Apk itself may be many times more expensive than that of Algorithm 2. Also, {r k} is generated automatically in TLM (MLM). This is one advantage of employing MRS in TLM (MLM) over in the classical single-level methods. Hence, our following implementation uses Algorithm 2 to generate the MRS sequence {vk} and the smoothed residual sequence {sk}. In practice, unless {r k} is seriously departing from f - Auk, Algorithm 2 is always preferable to Algorithm 3. 2.2.
T L M and M R S
Algorithm 2 may be applied to any step of Algorithm 1 where there is an updating of current values to generate one or more sequences of iterates with smoothed residuals. However, application of MRS requires the values of the residuals which are not generally computed at each step of TLM. Normally, the computation of the residuals is equivalent to one relaxation on that grid level and should be avoided whenever possible. Thus, we would like to use MRS to smooth the residuals calculated in TLM before they are weighted and projected to the coarse grid. Algorithm 4 is a procedure that incorporates MRS in TLM to generate a sequence with "smoothed" residuals.
ALGORITHM 4.
Two-level method and MRS
Given uoh and roh = f h _ A~uho, set vo = Uoh and s o = roh. For k--- 0, 1 , 2 , . . . , do Relax v I times on Ahu~ = f h with the given initial guess u~. Compute r~ = fh --Ahukh T h 2 Compute [3k+ 1 = - s k ( r k - s k ) / i l r ~ - @12. Set Sk+ 1 = S k "~- ~ k . l ( r k h -- s k) and vk+ 1 = v~ + /Sk+l(u ~ - vk). Restrict r [ = Rr~. S o l v e e H = ( A H ) - l r H.
Correct Uk+h 1 = U~ + PeH. Relax v 2 times on Ahu~+l =- f h with the initial guess u~+ 1. Algorithm 4 generates a TLM sequence {u~} with the associated residual sequence {r~} and an MRS sequence {vk} with the associated residual sequence { sk}. Note that vk and s k are generated before u~ and r~ for k >~ 1. We have [[skl[2 ~< llr~_l][2 for all k >1 1. Furthermore, in most classical iterative methods used as relaxation schemes in TLM, r0h is not usually
8
J. ZHANG
calculated. T h e exception is t h a t when the initial guess is taken as zero, i.e., u0h = 0, then r0h = fh. In general, we m a y not want to c o m p u t e the initial residuals just for MRS. Algorithm 5 is a slight modification of Algorithm 4; only v0 and s o are generated after the first round of smoothing sweeps on the fine grid.
ALGORITHM 5.
Two-level method and MRS
Given any initial guess u~. For k = 0, 1 , 2 , . . . , do: Relax v I times on A h u kh = f h with the given initial guess u~. C o m p u t e r~ = f h _ A b u t . If k = 0 , then Set v0 -- u0h and s o = r(~. Else C o m p u t e flk = - s r 1(Tk Sk-1)/ll r: - sk-lll[. Set s k = sk_ 1 + flk(r~ - sk_ 1) and v k = vk_ 1 + flk(u~ - vk_l). End if. Restrict r H = Rr~. Solve e H = ( A " ) -1 r~. Correct u~+ l = ukh + p e H. Relax v 2 times on A u k h+ 1 = fh with the initial guess u~+ 1. h
--
Algorithms 5 generates an auxiliary sequence {v k} with the associated residual sequence {s k} which satisfies ]]s~ll2 ~< Ilrkll2 for all k. T h e new residual norm sequence {l[ s~]]2} is better behaved t h a n the original residual norm sequence {11rkH2}. We actually have [23] ]lskll2 ~< min{llr0112, Ilrll12, . . . , IIrk[12}. This implies that if {11rkH2} is not monotonically decreasing, {11skll2} does, since [[sk[[2 ~< [[Sk_l[[2 for all k. In our following discussions and numerical experiments, we use Algorithm 5 to generate MRS sequence. 3. 3.1.
MRS A C C E L E R A T I O N
SCHEMES
T L M with M R S A c c e l e r a t i o n
T h e sequence {vk} generated by Algorithms 5 is guaranteed to have monotonically decreasing residual norms {11r~l]2}. Most existing researches use MRS techniques as a means to stabilize the residual sequence generated by some iterative method (usually C G - t y p e methods). These researches
Minimal Residual Smoothing
9
utilize the smoothing property of the MRS techniques and have been reported by many investigators [12, 20, 21, 23]. However, as Zhou and Walker remarked in [12], having a smoothly decreasing or monotonically decreasing residual norm may be of real importance or just a nicety, dependent on the particular application. Employment of MRS techniques as a means of acceleration has not been pursued extensively outside the context of CG-type methods. On the other hand, using MRS techniques to accelerate classical iterative methods still faces the cost of residual computation, which may render potential MRS acceleration scheme very expensive. If MRS techniques are applied to the sequences generated by classical iterative methods such as Jacobi or Gauss-Seidel methods, unless the original sequence diverges or behaves very irregularly, there is a serious question about the cost of generating such a new sequence against doing one more iteration using the original iterative method; the cost of generating the new sequence normally is larger than the original iteration due to the residual computation. It would be better if we could feed the new sequence, which has better behaved residual norms, back to the original iteration, to accelerate the convergence of the original iteration. In return, the accelerated original sequence will help MRS generate a better new sequence with even more "smoothed" residuals. However, this is not always possible. For instance, if the original sequence is generated by some Krylov subspace method, as are the cases usually discussed in MRS context, this approach would destroy some properties of the original sequence, e.g., mutual orthogonality, which is essential for the original sequence to converge. In particular, SchSnaner originally developed MRS to smooth the sequence generated by the CG-type methods and explicitly mentioned that {s k} is computed without feedback to the original sequence [19, p. 261]. Only in a later chapter, when he discussed the development of the PRES20 method, which is a pseudo-residual method, he restarted the original iteration from the smoothed sequence after 20 iterations [24]. He also claimed that a restart from the smoothed sequence pays. However, in TLM, the situation is different. Since the new iterate is not solely resulted from the old values by the relaxation scheme (it involves the correction from the coarse grid), there is no specific property that the original sequence must obey. Thus we may expect that a feedback of the new sequence generated by MRS will accelerate the convergence of the original iteration. A further advantage of the two-level methods is that the residuals of the original sequence are calculated even though there is no MRS involved. This advantage, if exploited properly, reduces the cost of implementing MRS acceleration schemes significantly. As we noted before, there are many places in a two-level method where we may insert the MRS procedure, but we would like to do it in the
10
J. ZHANG
cheapest way, i.e., to use existing information as much as possible and to minimize computation of quantities which are only useful in MRS. This leads us to insert MRS procedure just after the residuals on the finest grid being computed and before they are projected to the coarse level grid. At each major iteration, we replace the original TLM iterate u~ and its residual iterate r~ by the MRS iterate vk and the associated residual iterate s~. We then project the residual s k to the coarse-level grid to form a coarse-level grid subproblem. (Note that we must replace both the TLM iterate u~ and its residual r~ at the same time; otherwise, the coarse-grid subproblem would provide a wrong correction to the fine grid.) In this way, we give the coarse grid smoothed residuals which are essential for the coarse grid to provide a good coarse-grid-correction to the fine grid [1]. Therefore, we expect that the acceleration rate is favorable. The following Algorithm 6 is parallel to Algorithm 5, without unnecessarily computing the initial residuals before the MRS acceleration scheme is applied.
ALCOmTHM 6.
Two-level method with MRS feedback
Given any initial guess u~. For k = 0, 1 , 2 , . . . , do Relax l, 1 times on Ahu~ = fh with the given initial guess u~. Compute r~ = f h _ Abut. If k = 0 , then Set v0 = u0h and s o = r0h. Else Compute flk = - s T 1(rh -- 8k-1)//I] r: -- 8 k_ 1H2. Set s k = sk_ 1 + f l k ( r ~ - sk_ 1) and v k = v~ 1 + f l k ( u ~ - vk-1)Set u~ = vk and r~ = s~. End if. Restrict r H -- Rr~. Solve e H = ( A H ) - l r H . Correct uh+ 1 = u h + PeH. Relax v 2 times on Ahu~+l = fh with the initial guess U~+l. We refer to the acceleration scheme of replacing the TLM (MLM) sequence and its residual sequence by the MRS sequence and it residual sequence as the MRS acceleration scheme or MRS feedback. The role of MRS acceleration scheme in the two-level (multi-level) method is to accelerate the convergence of the coarse-grid-correction by reducing the norm of the residuals (smoothing the residuals). Since high frequency components of the errors have been effectively removed by the pre-smoothing sweeps, the
Minimal Residual Smoothing
11
MRS acceleration scheme primarily reduces the norm of the residuals resulted from the low frequency components of the errors. On the other hand, we may think the MRS acceleration scheme in a different way. MRS may be used repeatedly to smooth the high frequency errors after the initial iterate is generated by a particular relaxation scheme. This will be cheaper than using more pre-smoothing sweeps, especially when A h is complicated, since the MRS acceleration scheme is independent of the original operator A h. 3.2. M L M with M R S Acceleration We have developed algorithms to accelerate the convergence of the two-level method, and we predict that the results are favorable. One question left in the introduction is that of why we are only interested in using the MRS acceleration scheme on the finest level grid. In a multi-level method, one m a y think that the MRS acceleration scheme developed above m a y be used on the coarse level grids as well as on the finest grid. This is indeed possible if we are solely interested in generating a smoothed sequence. Because MRS will generate a new sequence with much smoother residuals regardless of the quality and the nature of the original sequence. However, generating such a "well-behaved" sequence is meaningless in the multi-level method. It will not be used to accelerate the convergence of the original sequence. This new sequence cannot be fed back into the original iteration process in the way we discussed above for the finest level (an exception is discussed in Section 6) because in each major MLM iteration, different residuals are projected to the coarse level and thus the subproblem on the coarse level is different. The iterates (the coarse-grid-correction) generated by a particular coarse-grid iteration have no direct connection; they are used to correct the fine grid solution at each iteration and have no usefulness after the coarse-grid-correction process. Hence, this explains why we only use the MRS acceleration scheme on the finest level. The multi-level method (MLM) may be considered as a two-level method by solving the coarse-grid subproblem by recursively employing coarser grids to obtain coarse-grid-corrections. Algorithm 7 is the formal multi-level method with the MRS acceleration scheme, which is parallel to Algorithm 6 of the two-level method.
ALGORITHM 7.
Multi-level method with MRS acceleration scheme
u h *-- M L M ( u h, fh)
12
J. ZHANG
Given any initial guess u0h. For k = 0, 1 , 2 , . . . , d o : If ~ h _ coarsest grid, then Solve u~ = ( A h ) - l f h. Else Relax v~ times on Ahu~ = f h with the given initial guess u~. C o m p u t e r~ = f h _ A b u t . If k = 0 , then Set vo = u0h and s o = roh. Else C o m p u t e flk = - s/_ 1(r2 - sk- ~)/H r~ - s k_ ~ll~. Set s k = sk_ 1 + flk(rk h -- sk_ 1) and vk = vk_ 1 + 13~(u~ - vk_l). Set u ~ = vk and r ~ = s k. End if. Restrict r 2h = Rr~.
Set f2~ = r~h. Set u~ h = 0. Do u~ h (-- MLM(u~ h, f2h) ~ times. Correct u kh+ l ~ ~/h ~t_ Pu~ h. Relax v 2 times on A~u~+l = fh with the initial guess u~+ 1. End if. T h e p a r a m e t e r ~ in Algorithm 7 is set to control the n u m b e r of times of the cycle to visit the coarse grid before it returns to the fine grid. In practice, only ~ -- 1 and lz = 2 are usually used. This corresponds to the so-called V-cycle and W-cycle schemes respectively (see [7, 25] for details on definition of various multigrid cycling schemes).
4.
NUMERICAL EXAMPLES
Numerical examples are set to test the following convection-diffusion equation with variable coefficients and satisfying Dirichlet b o u n d a r y condition
u~x+ u~+p(x,
y) u x + q ( x , y) u p = f ,
( x , y) e • ,
(4)
where 1] = (0, 1) )< (0, 1) is the unit square. T h e magnitudes of p( x, y) and q( x, y) determine the ratio of the convection to diffusion. In m a n y problems
Minimal Residual Smoothing
13
of practical interest, the convective terms dominate the diffusion. Many numerical simulation of (4) become increasingly difficult as the ratio of convection to diffusion increases. Specifically, we take p( x, y) = Mx and q( x, y) = - My. M is a positive constant to be chosen. The boundary conditions are given so that the exact solution is u( x, y) = xy(1 - xX1 - y)exp( x + y). A high-order compact finite difference scheme is used to discretize (4) which results in a linear system with a compact nine-point formula (details on the discretization formula can be found in [26]). The reason for choosing this high-order discretization formula is that it is stable for all M so that we do not worry about the magnitude of M which might cause divergence if a lower-order, e.g., five-point formula, were employed. To facilitate discussion, we define the cell Reynolds number associated with the discretized grid space ~h as
Re = max(
sup
~(x, y ) ~
I p(x, y)h
sup ,q(x, (x, y ) ~
y)[).h/2.
The magnitude of Re is supposed to affect the convergence of the numerical methods inversely. The coefficient matrix A h on all levels is not stored but computed in the process of iterations. For the numerical experiments of the two-level method, the fine grid mesh-size is 1/64 and the coarse grid mesh-size is 1/32. For the multi-level method, the coarsest grid contains only one unknown (with mesh-size 1/2). For all computation~ we apply one pre-smoothing and one post-smoothing sweeps (l, 1 = v 2 = 1). The discrete grid space is naturally (lexicographically) ordered and the point Gauss-Seidel relaxation is used as the smoothing method. The residual projection operator R is the fullweighting scheme and the interpolation operator P is the bi-linear interpolation (see [7]). Unless otherwise indicated explicitly, the norm used in the numerical experiments is the Euclidean norm. The computations are done on a Cray-90 vector machine at the Pittsburgh Supercomputing Center using Cray Fortran 77 in single precision (equivalent to double precision on serial machines such as SUN workstations).
3.1. TLM with MRS Acceleration In this subsection, we test Algorithms 5 and 6, i.e., the two-level method with and without MRS feedback (acceleration). In the two-level method, the coarse-grid sub-problem is not solved exactly. Instead, five relaxation sweeps are carried out on the coarse grid.
14
J. ZHANG
We have four sequences generated by TLM without MRS, MRS without feedback, TLM with MRS feedback (acceleration) and MRS with feedback. First, we choose M - 0 and (4) reduces to the Poisson equation. The convergence histories of the residual norm sequences are depicted in Figure 1. Since TLM is not a very efficient Poisson solver, it takes more than 400 iterations to converge to the limit of residual reduction. In this paper, the limit of residual reduction (LRR) is defined to be the limit of reducing residual norm by an algorithm on a given computer using finite-precision computation. The sequence generated by MRS alone converges faster than that generated by TLM alone (without feedback). This shows the smoothing property of the MRS technique. When TLM is combined with MRS with feedback (acceleration), both sequences converge very fast and almost reduce the iterations by half to reach LRR. This demonstrates the acceleration property of the MRS technique.
h = 1/64, M = 0
100
J
~
J
i
r
10-2 • ~:\
~
J
i
I
Solid line: TLM without MRS. Dashed line: MRS without feedback. \'\..\
~" 10 "-4 ,,,,I "0
TL_ _ ~ ~,\ \.\,\
Dashdot line: TLM with MRS feedback. i~
Dotted line: MRS with feedback.
10"~
~ 10 -8 0 W
~10 -1° 10-12
10-14 0
I
I
50
100
I
150
200
I
I
250 300 Iterations
I
I
I
350
400
450
500
FIG. 1. The convergencehistories of the residual norms (in log scale) with M = 0. Solid line is for original TLM without MRS feedback, dashed line is for MRS without feedback, dot-dashed line is for TLM with MRS feedback (acceleration), dotted line is for MRS with feedback.
Minimal Residual Smoothing
15
After all sequences converge to LRR, only that generated by MRS without feedback continue to decrease. This can be viewed clearly in Figure 2, where more iterations are allowed for the same test condition. However, in practice, these continuously decreasing residual norms come with high price and may not be cost-effective. Next, we take M = 128, the cell Reynolds number (Re) on the finest grid is 1. In this case, the two-level method is more efficient (the spectral radius of A h is smaller when Re = 128 than when Re = 0). The convergence histories of the residual norm sequences are graphed in Figure 3. The sequence generated by MRS without feedback converges faster than that generated by TLM without feedback from MRS. This again proves that the minimal residual smoothing property of the MRS algorithm. The convergence histories of sequences generated by TLM with MRS feedback and MRS with feedback are overlapped because they actually accelerate each
10 °
,
,
,
,
10 -2
h = 1/64, M = 0 ,
,
Solid line: TLM without MRS. Dashed line: MRS without feedback.
~ , 1 0 -4
DashdOt line: TLM with MRS feedback. Dotted line: MRS with feedback.
10 -a O e-
~ 10 ~ W
_~10 -1°
10 -12
10-141 0
1
000
2000 '
3000 '
4000 '
Iterations
5000 '
~00
7000 '
8000
FIG. 2. The convergence histories of the residual norms (in log scale) with M ~ 0. Solid line is for original T L M without MRS feedback, dashed line is for MRS without feedback, dot-dashed line is for T L M with MRS feedback (acceleration), dotted line is for MRS with feedback.
16
J. ZHANG h = 1/64, M = 128 102
i
i
i
i
1
i
i
10 0 Solid line: 10 -2
k'
~ 104
TLM without MRS.
Dashed line:
M R S without feedback.
"'%.
e-
~ 104 W O
~'N, \'\'\~:k
-- 10-10
\'.\ ~\-< 10 -12
10 -t4
I
I
I
I
I
I
I
5
10
15
20 Iterations
25
30
35
40
Fic. 3. The convergence histories of the residual norms (in log scale) with M = 128. Solid line is for original TLM without MRS feedback, dashed line is for MRS without feedback, dot-dashed line is for TLM with MRS feedback (acceleration), dotted line is for MRS with feedback.
other. However, since t h e original sequence ( g e n e r a t e d b y T L M w i t h o u t feedback) converges quite satisfactorily, t h e acceleration b y t h e M R S feedback is not as m u c h as it did for t h e Poisson e q u a t i o n (see F i g u r e s 1 a n d 2). Hence, when t h e original sequence converges fast, t h e benefit of M R S acceleration is s o m e w h a t less. W e e m p h a s i z e t h a t t h e M R S acceleration scheme is still efficient. A c o n v e c t i o n - d o m i n a t e d case is s t u d i e d next. W e t a k e M = 1.28 × 105. T h e cell R e y n o l d s n u m b e r on the finest grid is 1000. T h e initial residual n o r m is increased significantly. It is well-known t h a t t h e m a g n i t u d e of the cell R e y n o l d s n u m b e r affects t h e convergence of the n u m e r i c a l m e t h o d s inversely. T h e convergence histories of t h e residual n o r m sequences are d e p i c t e d in F i g u r e 4. It t a k e s T L M w i t h o u t M R S feedback a b o u t 2500 i t e r a t i o n s to reach LRR. T h e sequence g e n e r a t e d b y M R S converges faster,
Minimal Residual Smoothing 10 s
,
,
lo'
17 h = 1/64, M = 128000 , , ,
Solid line:
TLM without MRS.
Dashed line: MRS without feedback. ,-1
"102
i\
Dashdot line: TLM with MRS feedback.
"~
Dotted line: MRS with feedback.
'510°
\'~
~ 10_ 2
\
I
-g_
g'lo 4
~\'\\.x I _ _ _ ~ ___ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
104
lo-"
0
1000
I
I
I
I
I
I
2000
3000
4000 Iterations
5000
6000
7000
8000
FIG. 4. The convergenceof the residual norms (in log scale) with M = 1.28 × 105. Solid line is for the original TLM without MRS feedback,dashed line is for MRS without feedback, dot-dashed line is for MLM with MRS feedback (acceleration), dotted line is for MRS with feedback,
but is restricted by the TLM sequence, i.e., these two sequences are bounded together by MRS. Only after the TLM sequence reaches LRR, MRS is able to continue to decrease. Again, TLM with MRS feedback (acceleration) and the MRS with feedback accelerate mutually and their convergence histories are virtually the same. Both are remarkably better than TLM and MRS sequences without feedback. The iterations are halved. This again supports the claim made earlier that TLM and MRS feedback (acceleration) works very well when the original sequence converges slowly. One might doubt the efficiency of the MRS acceleration scheme because it may be viewed as one sequence with double iterates. It will not be advantageous if the double iterates are generated at the same cost and reduce the iterations by half. In fact, the sequence generated by MRS is very cheap, compared with that generated by TLM because MRS does not
18
J. ZHANG
involve the coefficient matrix whose realization is very expensive for the variable coefficient equations. Therefore, the cost of the MRS acceleration scheme is minimal, if not negligible. More about the cost of the MRS acceleration scheme are in subsection 4.3.
4.2. MLM with MRS Acceleration In this subsection, we experiment the multi-level method with MRS acceleration scheme. In Algorithm 7, tt is chosen to be 1, which is a V-cycle scheme. Also, we use a true V-cycle algorithm. Specifically, with h = 1 / 6 4 on the finest level, there are 6 levels and the coarsest level just contains one unknown. Five relaxation sweeps are carried out on the coarsest grid. We mention explicitly that the MRS acceleration scheme is only applied on the finest level. Again, we first consider the Poisson equation with M -- 0. It is well-known that the true multi-level method (MLM) is one of the most efficient Poisson solver. The convergence histories of the residual norm sequences are depicted in Figure 5. The sequence generated by the standard MLM reaches LRR in about 12 iterations. The sequence generated by MRS without feedback converges slightly faster. On the other hand, the sequences generated by MLM with MRS acceleration and MRS with feedback mutually accelerate the convergence and reach LRR in fewer iterations. This means that MRS acceleration scheme is efficient even when the original method converges fast, such as the true multi-level method, although the acceleration rate is somewhat low. We again consider the case when M = 128, the convergence histories of the residual norm sequences are contained in Figure 6. Again, the true multi-level method is very efficient in this case. The sequence generated by MRS converges slightly faster than that of MLM without feedback. The sequence generated by MLM with MRS acceleration converges faster than the sequences generated by MLM and MRS separately. The same is true for that generated by MRS with feedback. By comparing Figure 6 with Figure 5, we note that the acceleration rate is increased. Next, we consider the multi-level method with large Reynolds number, here M = 1.28 × 105. The convergence histories are depicted in Figure 7. Since the convergence of the true MLM is slow for large Reynolds number convection-diffusion problems, Figure 7 is similar to Figure 6, except that the acceleration rate is more attractive. MLM with MRS feedback is a powerful method to accelerate the true MLM in this case, with almost one-half reduction in iterations. Our conclusion from these test examples is that MRS acceleration scheme accelerates the convergence of the multi-level method regardless of the convergence of the original method. However, when the original method
Minimal Residual Smoothing
19
h= 1/64, M = 0 100
I
\
10-2
i
..
J
Solid line: MLM without MRS.
~. ~ ' 10-4 -g_ tD
,
Dashed line: MRS without feedback. ~'.~
Dashdot line: MLM with MRS feedback.
''~k'~
Dotted line: MRS with feedback.
"':.\
~ 10 -e
~ 104
\.\ ~'\\
"~."¢
~ 1 0 -1°
10-12
10 -14
0
s
10
Iterations
,;
2'0
F1c. 5. The convergence histories of the residual norms (in log scale) with M = 0. Solid line is for original MLM without MRS feedback, dashed line is for MRS without feedback, dot-dashed line is for MLM with MRS feedback (acceleration), dotted line is for MRS with feedback.
converges slowly, the benefit of the MRS acceleration scheme is more than when the original method converges fast.
4.3. ComputationalCost and Computed Accuracy In this subsection, we test the computational cost of the MRS acceleration scheme. We also consider the two-level and multi~level methods separately. The computations are terminated when the residual in L 2 norm on the finest level is reduced by 101°. We record the iteration numbers, the CPU time in seconds and the computed accuracy in L~ norm. Table 1 is for the two-level method with or without MRS acceleration scheme (feedback). From Table 1, we note that again that TLM with MRS feedback accelerate the convergence of the original T L M considerably, especially when the original TLM converges slowly ( M = 0 and M = 1.28 × 105). The
20
J. ZHANG h = 1/64, M = 128 10 2
i
1
i
i
i
i
i
i
i
Solid line: MLM without MRS.
100
Dashed line:
MRS without feedback.
Dashdot line: MLM with M R S feedback. 10 -2
\.~'~ ':
10 -4
Dotted line: MRS with feedback. \
~
"6
~ 10-~
"% ),
\ ",,
E
-.~,
-~ 10 -s
\\\
o 10_1~ "\
10 -12
1 0 -14
\ ' :)~.
I
\ X
I
l
Iterations
Fie;. 6. The convergencehistories of the residual norms (in log scale) with M = 128. Solid line is for original MLM without MRS feedback, dashed line is for MRS without feedback, dot-d~.shed line is for MLM with MRS feedback, dotted line is for MRS with feedback.
acceleration rates are very attractive. In the case of slow convergence, the acceleration rates are more t h a n 40%. F u r t h e r m o r e , comparing the acceleration rates in C P U time with those in iteration, we find t h a t the cost of the MRS acceleration is t r u l y negligible, less t h a n 1% of the T L M cost. This m e a n s t h a t the MRS acceleration scheme is a very cost-effective acceleration scheme for TLM. E v e n when M = 128, the acceleration is not very much, b u t it comes with negligible cost. T h e c o m p u t e d accuracy is the same with or w i t h o u t MRS acceleration scheme, this m e a n s the MRS acceleration scheme m a i n t a i n s the c o m p u t e d accuracy of the original T L M method. Next, we consider the multi-level m e t h o d with MRS feedback acceleration scheme. The d a t a from the numerical experiments are given in T a b l e 2.
Minimal Residual Smoothing
106
,
104
,
21
,
h = 1/64, M = 128000 , ,
Solid line: MLM without MRS.
~l
Dashed line: MRS without feedback.
~'102
~hedd~:]i:e: :::2
1h,e: ~:'::
back"
"5 10 0 O c ¢"O
10-2
o LU
--~10-4t
~'~ I
10-6I
'~"~~
10-81 0
...................................... _
1000
2000
.
3(300
4000 5000 Iterations
6000
7000
8000
9000
FIG. 7. The convergence histories of the residual norms (in log scale) with M = 1.28 × 10'5. Solid line is for original MLM without MRS feedback, dashed line is for MRS without feedback, dot-dashed line is for MLM with MRS feedback, dotted line is for MRS with feedback.
T a b l e 2 gives similar results as T a b l e 1. T h e acceleration r a t e s are very a t t r a c t i v e in all cases. T h e average acceleration r a t e is even b e t t e r t h a n t h a t for t h e two-level m e t h o d . T h e acceleration cost is negligible. T h e acceleration scheme does not change t h e c o m p u t e d a c c u r a c y of the original multilevel m e t h o d . W e note t h a t , when M = 0, t h e acceleration r a t e in C P U t i m e is slightly larger t h a n t h a t in iteration. T h i s is possibly caused b y t h e n a t u r e of t h e v e c t o r c o m p u t e r a n d t h e t i m i n g function. Since m a n y users use the comp u t e r at t h e s a m e t i m e a n d the o v e r h e a d m a y be large w i t h respect to t h e a c t u a l c o m p u t i n g t i m e in this case. F r o m these tests, it is clear t h a t M R S acceleration scheme should be r e c o m m e n d e d for accelerating t h e convergence of t h e multi-level m e t h o d .
22
J. ZHANG TABLE 1 EFFICIENCYOF MRS ACCELERATIONSCHEMEIN TWO-LEVELMETHOD TLM without MRS
M
Iterations
0 128 128000
CPU
Error
358 27.98 5.57(-9) 23 1.83 1.84(-6) 1982 155.74 1.32(-4)
TLM with MRS Acceleration Iterations
CPU
Error
195 15.28 5.57(-9) 21 1.67 1.84(-6) 1111 87.48 1.32(-4)
Acceleration Rate Iterations
CPU
45.53% 45.39% 9 . 5 2 % 8.74% 43.95% 43.83%
W h e n t h e linear s y s t e m results from discretized p a r t i a l differential equations w i t h v a r i a b l e coefficients, t h e c o m p l e x i t y of t h e m a t r i x renders the cost of i m p l e m e n t i n g M R S acceleration scheme negligible.
5.
CONCLUSIONS
W e h a v e shown t h a t t h e m i n i m a l residual s m o o t h i n g ( M R S ) is an effective w a y of g e n e r a t i n g sequence with " s m o o t h e d " residual n o r m s in b o t h two-level m e t h o d ( T L M ) a n d multi-level m e t h o d (MLM). W e explored t h e acceleration p r o p e r t y of t h e M R S techniques a n d c o n s t r u c t e d a l g o r i t h m s t h a t effectively e m p l o y M R S acceleration scheme to accelerate the convergence of b o t h T L M a n d MLM. Hence, M R S techniques are not only effective s m o o t h i n g techniques to stabilize t h e residuals of t h e original sequence, b u t also cost-effective acceleration techniques to accelerate t h e convergence of the original sequence. T h e M R S acceleration scheme developed in this p a p e r is i n d e p e n d e n t of t h e original o p e r a t o r A h and of the r e l a x a t i o n m e t h o d employed. O u r n u m e r i c a l e x p e r i m e n t s d e m o n s t r a t e d t h a t t h e acceleration r a t e is r e m a r k a b l e a n d t h e cost of t h e acceleration scheme is negligible. W e Minimal Residual Smoothing
TABLE 2 EFFICIENCYOF MRS ACCELERATIONSCHEMEIN MULTI-LEVELMETHOD MLM without MRS M 0 128 128000
Iterations
CPU
Error
12 0.788 5.59(-9) 30 1.963 1.84(- 6) 2036 132.67 1.32(- 4)
MLM with MRS Acceleration Iterations
CPU
Error
10 0.656 5.59(-9) 20 1.328 1.84(- 6) 1117 74.99 1.32(- 4)
Acceleration Rate Iterations
CPU
16.67% 16.75% 33.33% 32.35% 45.14% 43.48%
Minimal Residual Smoothing
23
also showed that the benefit of the MRS acceleration is large when the original TLM or MLM converge slowly. But the acceleration is achieved regardless of the convergence of the original method.
6.
S U G G E S T I O N S ON F U T U R E R E S E A R C H
For those sequence which converges very slowly, such as those generated by the TLM or MLM in solving the convection-diffusion equations with variable coefficients and high-Reynolds number, there may be a reason to use MRS techniques to accelerate the original method more than once. One way of doing this is to smooth the residual sequence repeatedly. Another way is to smooth the sequence when it finishes the coarse-grid-correction cycle. The second way requires the computation of residuals. However, if MRS acceleration scheme is more efficient than the relaxation sweep to remove the high frequency errors, this approach is viable as a substitute for the inefficient relaxation schemes. Current investigation with this approach is under way. The more general minimal residual smoothing techniques discussed by Brezinski and Redivo-Zaglia [22] m a y be employed to accelerate the convergence of the multi-level method. Also, there is an approach of defining the new sequence by minimizing the residual norm in a different sense, e.g., in the energy norm, if A h is positive definite. The energy norm involves the coefficient matrix A h and therefore may cost more when A h is complicated. In Section 4.2, we mentioned that the MRS techniques are applicable only to the finest level. On the coarse levels, each iteration solves a different sub-problem projected from the fine level and thus no continuous MRS acceleration like that for the fine level makes sense on the coarse levels. However, MRS techniques may still be used to smooth the residuals generated on the coarse level before and after they are projected to an even coarser level. We may expect that this will give a better residual sequence to the coarser level. In this way, only a short MRS sequence is generated in each MLM iteration.
The author would like to acknowledge with gratitude the valuable discussions through electronic mail with Professor Homer F. Walker, who also provided many sources of references. REFERENCES 1 A. Brandt, Multi-level adaptive solution to boundary-value problems, Math. Comput. 31:333-396 (1977).
24
J. ZHANG
2 I. Yavneh, Multigrid Techniques for Incompressible Flows, PhD Thesis, The Weizmann Institute of Science, Rehovot, Israel, 1991. 3 A. Brandt and I. Yaw~eh, On nmltigrid solution of high-Reynolds incompressible entering flows, J. Comput. Phys. 101:151-164 (1992). 4 A. Brandt and I. Yavneh, Accelcrated inultigrid convergence and highReynolds rccirculating flows, SlAM J. Sci. Comput. 14:607-626 (1993). 5 A. Brandt and V. Mikulinsky, On recombining iterants in nmltigrid algorithms and problems with small islands, SIAM .L Sci. Comput. 16:20-28 (1995). 6 J. Zhang, Acceleration of five-point Red-Black Gauss-Seidel in multigrid for two dimensional Poisson equation, Appl. Math. Comput., (accepted). 7 P. Wesseling, An Introduction to Multigrid Methods, Pure & Appl. Math., John Wiles & Sons, Chichester, 1992. 8 Y. Saad and M. H. Schultz, GMRES: a gencralized miifimal residual method for solving nonsymmetric linear systems, SIAM J. Sic. Stat. Comput. 7:856-869 (1986). 9 R. Fletcher, Conjugate gradient methods for indefinite systems, in Numerical Analysis (G. A. Watson, Ed.), Dundee 1975, Scotland, Lecture Notes in Math. 506, Springer-Verlag, Berlin, 73-89 (1976). 10 C. Lanczos, Solution of systems of linear equations by minimized iterations, J. Res. Nat. Bur Stand. 49:33-53 (1952). 11 P. Sonneveld, CGS, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM J. Sei. Stat. Cornput. 10:36-52 (1989). 12 L. Zhou and H. F. Walker, Residual smoothing techniques for iterative methods, SIAM J. Sei. Comput. 15:297-312 (1994). 13 M. H. Gutknecht, Variants of BICGSTAB for matrices with complex spectrum, Tech. Rep. 91-15, Interdisziplin~ires Projektzentrum f'fir Superconlputing EidgenSssische Technische Hochschute, Ziirich, Aug. 1991. 14 H.A. van der Vorst, BI-CGSTAB: a fast and smoothly converging variant of Bi-CG for thc solution of nonsymmetric linear systems, SL4M J. Sci. Stat. Comput. 13:631-644 (1992). 15 T.F. Chan, L. de Pillis, and H. van der Vorst, A transpose-free squared Lanczos algorithm and application to solving non-symmetric linear systems, Tech. Rep. CAM 91-17, Dept. of Maths., UCLA, Dept. 1991. 16 T. F. Chan, E. Gallopoulos, V. Simoncini, T. Szeto, and C. H. Tong, QMRCGSTAB: a quasi-minilnal residual variant of the Bi-CGSTAB algorithm for nonsynmmtric systems, Tech. Rep. CAM 92-26, Dept. of Maths., UCLA, June 1992. 17 R.W. Freund and N. M. Nachtigal, QMR: a quasi-minimal residual method for non-Hermitian linear systems, Numer. Math. 60:315-339 (1991). 18 R. W. Freund, A transpose-free quasi-minimal residual algorithm for nonHermitian linear systems, SIAM J. Sci. Comput. 14:470-482 (1993). 19 W. SchBnauer, Scientific Computing on Vector Computers, North-Holland, Amsterdam, 1987. 20 l=L Weiss, Convergence Behavior of Generalized Conjugate Gradient Methods, Ph.D. Thesis, Univ. of Karlsruhe, Germany, 1990.
Minimal Residual Smoothing
25
21 H.F. Walker, Residual smoothing and peak/plateau behavior in Krylov subspace methods, Appl. Numer. Math. 19:279-286 (1995). 22 C. Brezinski and M. Redivo-Zaglia, Hybrid procedures for solving linear systems, Numer. Math. 67:1-19 (1994). 23 R. Weiss, Relations between smoothing and QMR, Tech. Rep., Universit~t Karlsruhe Numerikforschung ffir Supercomputer, 1993. Submitted to Numer. Math. 24 W. SchSnauer, H. Milller, and E. Schnepf, Pseudo residual type methods for the iterative solution of large linear systems on vector computers, Parallel Computing 85, (M. Feilmeier, J. Joubert, and U. Schendel, Eds.) North-Holland, 193-198 (1986). 25 A. Brandt, Multigrid Techniques: 1984 Guide, with Applications to Fluid Dynamics, GMD Studien Nr. 85, GMD-AIW, Postfach 1240, D-5205, St. Augustin 1, W. Germany, 1984. 26 M.M. Gupta, R. P. Manohar, and J. W. Stephenson, A single cell high order scheme for the convection-diffusion equation with variable coefficients, Int. Y. Numer. Methods Fluids 4:641-651 (1984).