JOURNAL OF
Economic Dynamics & Control
Journal of Economic
Dynamics and Control 20 (1996) 1051-1071
ELSEVIER
Hybrid algorithms with automatic switching for solving nonlinear equation systems A. Hughes Department
ofEconomics,
(Received
Hallett*, University
Y. Ma, Y.P. Yin ofStrathclyde.
April 1995; final version
Glasgow G4 OLN. UK
received October
1995)
Abstract An optimal switching device is introduced for hybrid algorithms which try to combine the superlinear convergence speeds of Newton model solution techniques with the cheapness of slower Gauss-Seidel techniques. We also propose a device for combining second-order and Gauss+Seidel iterative techniques. Tests show that these hybrids lead to powerful improvements in solving nonlinear models, both in terms of accelerating convergence and reducing computational costs and in extending the radius of convergence in difficult cases. Key words:
JEL
Nonstationary iterations; C63; C87
Convergence
radius: Model solution
classijication:
1. Introduction
Algorithms designed to solve multi-sector economic models are typically more sensitive to the choice of start values than those designed to solve other types of equation system. This is because it is much more difficult to select a set of mutually consistent start values for the different sectors, which are also within some neighbourhood of the final solution, than it is to pick a small set of roughly consistent economic aggregates dominated by simple trends. The same problem arises with dynamic rational expectations models, where we have to pick start values for each time period to be solved ~ but those further ahead are likely to become increasingly off track and hence increasingly
*Corresponding
author
(also affiliated
with CEPR.
London).
0165-1889/96/$15.00 Q 1996 Elsevier Science B.V. All rights reserved SSDI 0165.1889(95)00889-4
1052
A. Hughes Hallett et al. /Journal
of Economic Dynamics and Control 20 (1996) 1051-1071
dependent on the accumulation of errors from the earlier (but inaccurate) choice of start values. And if the equation system also contains intertemporal optimising behaviour, then start values must be selected jointly for each sector’s activity level and in each time period. The two problems will compound one another. In addition, if each sector contains significant nonlinearities (in our case from the sectoral production and utility functions), then the neighbourhood of acceptable start values must be reduced to accommodate the nonlinear intersectoral dynamics. These three features make it much harder to secure convergence (or rapid convergence) for nonlinear models using conventional algorithms. This paper introduces hybrid algorithms; and examines how, and how far, they can improve on the convergence characteristics of the underlying algorithms from which they are constructed - specifically in terms of accelerating convergence, reducing the computational burden, and (most importantly) by expanding the radius of convergence. Our hybrid algorithm is a weighted combination of Newton and Fast Gauss-Seidel iterations. An optimal weighting scheme is then developed, with the additional feature that it can be switched automatically (that is, without intervention by the user) either to the component algorithm with the larger convergence radius or to the component algorithm with the lower computational burden. This allows us to trade-off the possibility of a larger number of steps against a lower workload per step in the sparse nonlinear equation systems found in economics. But convergence itself still cannot be guaranteed. Consequently we compare this mechanism to hybrid schemes based on second-order iterations which do have theoretically guaranteed convergence. The success of the latter shows that it is the automatic extension of convergence radii which is our technique’s most important contribution.
2. Conventional equation solution techniques 2. I. Newton methods
Consider the general nonlinear equation system:
A(Y, 4
=a
i = 1, . . . , n,
(1)
where y E R” and z E R” are vectors of real-valued unknown and predetermined variables, respectively, and where 1;:(.) are arbitrary real-valued functions. For simplicity we assume that the vector-valued function representing the entire equation system - f(y, z) =0 - has at least one real solution, y*, for the known or projected values of z.
A. Hughes Hallett et al. / Journal of Economic Dynamics and Control 20 (1996) 105/-1071
Newton’s a first-order
method solving the nonlinear system (1) given expansion about some trial solution value y(O):
fi(Y’“‘)+ j$l(%laYj),4Yj This yields the first-order y(s) = y(s-l)
_
-
$9
=O,
i = 1, . . . , n.
z, is based
1053
on
(2)
iteration
[F(S-l)]-lf(y(S-l),Z),
(3)
where F’“- ‘) = [aflay]Y(S_ r, is a matrix of partial derivatives evaluated at the current iteration. The convergence conditions for (3) can be summarised as follows.’ Iff( .) is continuously differentiable over a convex set D containing y*, wheref( y*, z) = 0 andf(y*) is nonsingular, then there exists an open set C about y* such that (3) the Lipschitz converges at least linearly from any y(O) E C. If, in addition, condition IIF(y) - F(y*) 11I dll y - y* /I holds for y E C and some d > 0, the rate of convergence becomes quadratic; i.e., 11y(S)
-
y*
115 6 11y’“- ” - y* 11 l +’
(4)
holds for Y = 1 and some 6 > 0. Quadratic convergence will always be observed if D, and hence C, is chosen small enough whenf( .) is twice differentiable at y* (Dennis and More, 1977). We are therefore virtually assured quadratic convergence within some neighbourhood of y*. But that neighbourhood could be very small, and we still face the problem of how to find starting values y(O) which actually lie within C. The contribution of this paper is to show how that can be done since, if y (O)is chosen some distance from C, the iteration may not even converge.2 2.2. Mod$ed
Newton methods
The advantages
of using Newton
solution
methods
are:
(i) Convergence is extremely fast (superlinear) once the iterates are within C. (ii) Convergence is guaranteed for an arbitrary system given any start within C. (iii) No parameters need to be set to ensure convergence.
‘Ortega
and Rheinboldt
(1970, p. 312).
*Becker and Rustem (1993) cite a number of studies where Newton methods either failed to converge or were rather slow. To have techniques which extend their radius of convergence or speed them up would be an important advance therefore.
1054
A. Hughes Hallett et al. J Journal of Economic Dynamics and Control 20 (1996)
However
the disadvantages
1051-1071
are:
(4 Because of the derivative
evaluations and matrix inversions, it is computationally expensive without the compensating guarantee of global convergence (the number of calculations per step are of O(n3)).3 (b) The initial ‘guess’, y(O), may have to be a good approximation of y* because C can be very small. Divergence is certainly possible, and simpler but slower methods may have a wider radius of convergence. (4 The inversion of F prevents any exploitation of the sparseness of F when solving for each iterate. Sparseness is a feature common to all economic models. There are several ways to reduce the computational burden of the Newton method: Avoiding Any Derivative Evaluations: The matrix F can be constructed using information from past iterates rather than by differentiation. Ortega and Rheinboldt (1970) suggest various schemes for this purpose. But to identify all the elements of F requires n initial iterations, and then a matrix inversion of order n, so there are no computational savings. Avoiding Repeated Matrix Inversions: Another suggestion is to avoid inverting F repeatedly. This involves evaluating F only every mth step. If m = 1, we have the original Newton method. If m = 03, the same F values will be used throughout. The overall rate of convergence now falls to (m + 1)/m (Ortega and Rheinboldt, 1970, p. 316). So as m increases, the speed of convergence falls to that of linear first-order iterations and the Newton method’s advantage has been lost. An intermediate position is occupied by Becker and Rustem’s (1993) N-SCE method which reevaluates only parts of F and then only when it improves the rate of convergence. The same results must therefore apply, although the fall to linear convergence rates will of course be a little slower. Avoiding matrix inversions altogether: Yet another method is to generate successive F - ’ values by rank one or rank two updating schemes, rather than by function evaluations. The rate of convergence of (3) using these updates is at least linear if f( .) satisfies the Lipschitz condition (Dennis and More, 1977). However, the system’s sparseness is not exploited and the number of calculations per step is still O(n’). With convergence slower by an order of magnitude, that means there are no computational savings. Sparse matrix techniques: Perhaps the most effective way of reducing the computational burden of Newton methods is to employ sparse techniques
3This estimate of the computational burden assumes matrix inversion by factorisation; see Hughes Hallett and Fisher (1990). In fact, throughout this paper ‘inversion’ will be taken to imply inversion by factorisation methods. However, further computational savings might be made by using sparse matrix techniques, and they are considered separately below.
A. Hughes Hall&t et al. /Journal
of Economic Dynamics and Control 20 (1996) 105/L1071
1055
within each step. Duff, Erisman, and Reid (1986) show that the computational complexity of solving (2) can be reduced to 0(&i) per step, where c = C ni/n is the average number of nonzero entries per row of the Jacobian matrix. This is a significant improvement; and since econometric models typically have three or four endogenous variables on the right of each equation, it would leave each Newton step three or four times more expensive than its linear (e.g., GausssSeidel) counterpart which has a computational burden of 0(x ni). To that comparison we must add the cost of obtaining numerical values of the partial derivatives in F - a further n iterations at least, as noted above. Thus a sparse Newton algorithm would become competitive if it were more than four or five times faster than its linear counterpart. That suggests that the main contribution of the hybrid algorithms which follow will actually be to extend the radius of convergence of the underlying iterations, and to do so automatically, rather than to provide faster or cheaper solutions as such. 2.3. Equation reordering Following Nepomiastchi and Ravelli (1978) and Gabay et al. (1980) Don and Gallo (1987) have argued that economic models can often be reordered to produce a structure containing a small block of simultaneous equations, plus pre- and post-recursive blocks of equations. This would reduce the computational burden since our solution algorithm now need only be applied to the smaller simultaneous subsystem - the remaining equations being solved recursively given the solution to the simultaneous block. Moreover the simultaneous block may also be reordered to minimise the number of nonzero above-diagonal elements in the systems Jacobian matrix, F = aflay, and to evaluate partial derivatives in F.4 A two-part iteration, the inner loop being a simultaneous equation solution of the feedback variables and the outer loop a recursive solution of the remaining variables in the simultaneous block, may then be applied to the simultaneous block before the rest of the mode1 is solved recursively. Hence within the simultaneous block we have a recursive solution for the nonfeedback variables, y:+
l) = h,($,
and an inner (j) = h,
Yf
YAk',z),
loop on the feedback
variables,
cyy-11,y(J), z),
“An important disadvantage of this approach is that reordering algorithms are themselves computationally expensive to run. Fortunately Levy and Low (1988) have shown how to use a reordering algorithm so that the time taken does not exceed the time saved in the solution phase. It will be assumed that this condition has been met in what follows.
1056
A. Hughes Hallett et al. /Journal
of Economic Dynamics and Control 20 (1996) IO51-IO7I
where y: denotes a converged value from the previous inner loop and ( yj”, yp’) defines the start. If the simultaneous block contains m equations, but p < m feedback variables, and if the Newton method is used for the inner loop, the computational cost will fall from O(m3) to 0(p3) + O(m - p) calculations. However the rate of convergence will have been slowed down to the linear rate of the extra interloop Gauss-Seidel steps. But if not too much is lost in that way, there could be substantial savings overall. On the other hand, the same reordering techniques can also be applied to the linear first-order methods which follow, so the computational savings may not be so large in relative terms. Nevertheless all our tests are based on reordered systems in order to incorporate these savings. 2.4. Gauss-Seidel techniques: The linear case Consider the linear system Ay = b,
(7)
where A E R”*”is a known real matrix with nonvanishing diagonal elements, and y and b are the rectors of unknown and predetermined variables, respectively. Stationary first-order linear iterations can be used to construct numerical solutions given an arbitrary start y(O): y@) =
Gy’“-
1) + ,-,
s = 1,2,3, . . .
(8)
These iterations are based on the splitting A = P - Q, where P is nonsingular and G = P-IQ and b = P- ‘c. The most widely used techniques are the Jacobi, Gauss-Seidel, and Successive Overrelaxation (SOR) methods, P = D,
P = (D - L),
P=;D(I-MD-IL),
(9)
respectively, where D and L are matrices of order n such that Dij =
A, 0
if i = j, otherwise,
- Aij
Lij =
0
if i < j, otherwise,
where U = D - A - L and a is a parameter to be chosen. Without loss of generality, let D = 1. The rate of convergence in (8) can be increased by the one-parameter extrapolation, Y (‘+ ‘)
= y(Gy'"' + c) + (1 -
y)y’“’
=
fzy’“’ +
yc.
(10)
A. Hughes Hallett et al. / Journal
qf Economic Dynamics and Control 20 11996) 1051-1071
1057
Two particular versions of (10) are routinely used: the Jacobi overrelaxation (JOR) method with G = I - A and the Fast Gauss-Seidel (FGS) method with G=(I-crL)-‘(ctU+(l-ci)l). Convergence unalysis: The convergence theory for first-order linear iterations can be summarised as follows. Proofs will be found in Hughes Hallett (1984a. 1986).
(i) Eq. (8) converges if and only if p(G) < 1, where p( .) denotes special radius. Hence: (ii)
Lemma 1. Eq. (10) converges for some 7 > 0 $ and only $ aj < 1, j=l 3 “. , n, where G has eigenvalues ij = aj + ibj and i = J - 1.
(iii) The number of steps to convergence gence criterion of maxI
I
- y!"-') , vyl”-“I
in (lo), given p(H) < 1 and a conver-
(11)
< 5,
is approximately log r/log p(H). The corresponding convergence speed can be measured as its asymptotic rate ( - log p(H)) which, in view of (4) implies a linear rate of convergence (r =O). We therefore need to minimise p(H) in order to maximise the radius of convergence, as well as to minimise the number of steps to convergence. (iv) Lemma 2. Let ri = (aj - 1)2 + bf, pj = -2(aj - l)irjz and b’ = minj(Bj). Under the conditions of Lemma I, the optimal value,fi,r y in (10) is y* =min
min{y,I+/$ [
k
1
>O,
where y1 = minj=,, ,,,n{laj} dejines a1 among the set Yk = minj>k_l{2(ak_l aj)/(t$ ri_ ,)} defines ak (for k = 2, It is convergent only $0 < r < /I.
aj and , n - 1).
(v) Lemma 3. tl > 0 exists small enough such that SOR in (8) is convergent ifthe matrix B has all eigenvalues with real parts less than unity. But it converges only If0 < u < 2. (vi) It has been possible to extend Lemma 3 only for a few special cases, for example, where A is cyclic, symmetric, and positive definite, or irreducibly diagonally dominant.
1058
A. Hughes Ha&t
2.5. Gauss-Seidel Consider y(S) =
et al. /Journal
techniques:
of Economic Dynamics and Conrrol 20 (1996) IO5/-IO71
The nonlinear case
now the nonstationary G’“-
‘jy(S-
1) +
iteration
k
(12)
>
where the iteration matrix G@‘) is dependent on the iteration path Y’S_1) . . . (O’ For example, JOR and SOR iterations applied to (1) are Y . obtained by choosing a normalisation, say Yi = gi(Y)v and then computing, respectively, yl"'=
ygi(yi”-”
. . . yy’)
y!"' =
crg.(y'"'.,, y!"'
+
(l-
y)$”
(13a)
‘~-“)+(l-cc)yl”-”
(13b)
or 1
1
L
1' y!"-1' r+l
...
Y,
for i = 1, . . . , n and s = 1,2, . . . . The FGS iterates are obtained by replacing the complete vector y@’ from (13b) with the extrapolation yy’“’ + (1 - y)y’“- ‘) and then using the extrapolated values as the start for step s + 1, i.e., y!“-
l/2)
=
ccgi(y~-'/2) .,.yf_-"'2', yl”,-,”
. . . Jp)
+ (l-
a)yi”-”
I
and
yy' = yyjs-
(14)
I”) + (1 - y)yj”- I’.
Conuergence: The nonlinear JOR, SOR, and FGS iterations are special cases of (12) with G (‘-‘r = y/W’) + (I- y)Z or (I - c~L(“))-~(ctU(“-~r + (l- cr)l) or y(l _ UL(s- 112)) - 1(aU’“- ‘) + (1 - ~1)) + (1 - y)l, respectively, and where submatrices U’“- ‘) and B(S- 1) = (agiaY),,,- lr has upper and lower triangular I,‘“- ‘r. Convergence to a fixed point y*, given an arbitrary start y(O) within a neighbourhood of y*, then follows if p(G*) < 1, where G* is GcS-r) evaluated at y* (Ostrowski, 1966). The rate of convergence will be linear for any of these methods. 2.6. Second-order
(semi-iterative)
techniques
Iterations of this type are obtained by accelerating iterates, that is, by constructing the weighted sums
u(‘) = x$o d,,ky(k)
with
1 ds,k = 1, k
the whole sequence
of y(”
(15)
A. Hughes Hallett et al. /Journal
of Economic Dynamics and Control 20 11996) IO_?-1071
where the first-order extrapolative have solution y*. Then u”’ - y* =
scheme at (8) generates
PS(G) = i:
the y(k) values. Let (7)
c d,.ky'k' - y* = c d,,ke’k’= pJG)e"',
where yck’ - y* = eck’, since do,o = 1 and matrix polynomial ds.kGk>
u(O) = y”‘.
.,
s =o, 1,
1059
(16) Now
(16) contains
the
(17)
k=O
and for any norm and an arbitrary I/ u@ - Y II I
start y”‘, we have
(18)
IIPS(@II. IIe”’ Il.
Hence, the sequence IP’ converges onto y* most rapidly if, at each s, we pick the optimal Set of weights ds,k, k =O, . , s, in the sense of minimising 11p,(G) 11.In fact it turns out that the optimal weights are related through a second-order (Chebychev) recursion which, when put back into (15), produces a second-order iteration in y “’ . As a result the operational form of the semi-iterative acceleration of (8) where G may eiiher be a general FGS iteration matrix or one of the simpler cases, is y@'
=
ysG*y’“-
l’ + (1-
yJy(‘-”
+ k,,
(19)
where Y1+ (1 - Ys-*/4w2)-‘,
yo = 2,
w = (2 - (a* + a**))/& 6 = max { (2aj - (a* + a**))2 + 4bj2) 112,
G* = (2G - (a* + a**)1)/(6w), k, =
2y,c/( ~6).
In this notation, G itself has eigenvalues 1 = aj + ihj, j = 1, . , n, where u** = min(aj) and a* = max(aj). Like Lemma 2, this extrapolation device requires knowledge of all the eigenvalues of G. However it can be simplified by assuming that u* = - u** = p(G). In that case p(G) = l/26 = w- ‘, and (19) becomes Y
6’
=
YsGy’“-
1’
+
(l-
yJy’“-*’
+ y,c,
(20)
1060
A. Hughes Haliett et al. 1 Journal of Economic Dynamics and Control 20 (1996) 1051-1071
where ys = (1 - ys_ 1p(G)/4)- ’ and y. = 2. This simplification removes the need for accurate eigenvalue evaluations, thus saving a considerable amount of auxiliary calculations, but it is not known how much this might damage the convergence properties of (19). Conoergence: Originally semi-iterative accelerations were shown to converge for the special case where G is Hermitian and p(G) < 1 (Young, 1971). Later results showed convergence for more general equation systems (Manteufel, 1977); and eventually for iterates generated by (8) with any real matrix G and any start values y (” (Hughes Hallett, 1984b). Obviously that means convergence is also guaranteed for nonlinear systems within a suitable neighbourhood of y*. But it may be difficult to obtain global convergence or maximum convergence speed since that would require accurate eigenvalue evaluations at every step. Thus (19) can be constructed to converge for any real matrix G and any equation system. But the simplification in (20) may still fail to converge in some cases.
3. Automatic switching devices 3. I. A switching device for Newton algorithms
The major disadvantage of the Newton method is its computational expense. The repeated derivative evaluations and matrix inversions require calculations of the order of sn’(n + 1) to reach the sth step, whereas linear first-order iterations such as Gauss-Seidel require only sIni (where ni is the number of unknown variables in the ith equation). Since ni averages perhaps 3 or 4 in most economic models, whereas n is often around 50 or more, the work involved per Gauss-Seidel step is typically up to an order of magnitude lower than that in the Newton method.5 The crucial point is therefore that Gauss-Seidel methods trade slower convergence for less computation per step. Hughes Hallett and Fisher (1990), for example, show that trade-off favoured the Gauss-Seidel based schemes in the context of six econometric models of the UK economy. In fact GausssSeidel techniques had computation burdens which were a fraction of a single Newton or modified Newton step. In such cases it would certainly pay to switch to
51nclusive of sparse calculation techniques and partial derivative evaluations, as detailed in Section 2.2. The comparison with a reordered system, with a simultaneous block of m equations and p feedback variables, might seem more favourable: s[p’(p + 1) + ~~=~,“=;” ni] for Newton compared to s I:=, ni for Gauss-Seidel. However, the slower convergence of the Newton algorithm, due to the interloop iterations, may make the total computational burden less favourable to Newton than it might appear on the basis of a fixed number of steps. Without loss of generality, we have written these computational burdens as if the feedback equations were ordered last within the simultaneous block.
A. Hughes Hallerr et al. 1 Journal of Economic Dynamics and Control 20 (1996) 1051-1071
1061
Gauss-Seidel techniques as soon as they are convergent at sufficient speed to yield a lower overall burden. The second and more important point is that neither Gauss-Seidel nor Newton methods are guaranteed to converge from any start values. Moreover, because the factors which determine their convergence are quite different, they are likely to have quite different radii of convergence. That means that, in certain circumstances, one will be convergent and the other divergent. For Newton, it is starts outside the neighbourhood of y* defined by the set C which prevents convergence. But the exact location of C cannot be known ex ante. By contrast, Gauss-Seidel will fail to converge if the system’s Jacobian matrix has eigenvalues greater than unity in real part. In other words, a switching mechanism should be designed to switch to the convergent algorithm autamaticalfy if the other should prove divergent. Indeed since the second-order iterations (19) have guaranteed convergence, it would be helpful to include them as well since that will allow us to extend the convergence property to global convergence. 3.2. The number of steps to convergence To get the projected number of steps to convergence by any iterative process, we note that Eq. (4) implies
s-1 5
64 (1 y’o’
- y* l/(‘+rf,
q = C (1 + r)j.
(21)
j=O
Hence, log IJy’“’ - y* )I -log 11y”’ - y* 11I 4 log6 + I( 1+ r)I - I] log 1)y’O’- y* I! and, for the usual convergence test, the total number of steps to convergence will be found by solving
logs2
1
s-l
log6 x(1
+r)j+[(l
+r)“-l]log//y’“‘-y*)I
(22) I
j=O
for s. Hence, in the Gauss-Seidel case where r =O, SG = log z/log 6. will also apply to any other method with linear convergence rates. let ef = I/y(‘) - y* 11,where i = N means eb is obtained using a Newton and i = G means ef is obtained using a Gauss-Seidel step like
6Since y* is unknown,
et will, in practice,
be approximated
by ef = /I P
-f(
y’“‘, z) //
That Now step (1O).6
1062
A. Hughes Hallett et al. /Journal
of Economic Dynamics and Control 20 (1996) 1051-1071
Define 6: = er/er_ 1 and 8: = ep/e:_ 1. The estimated number of steps convergence for a linear first-order method, and Gauss-Seidel in particular, now Se = log r/log S,Gsince (21) implies
to is
6 = log - l {log I/y(S) - y* II - (1 + I) log IIy’“- I) - y* II } = 1)y(s) - y* 11 /I/ y’“- 1) - y* III+’
(23)
and r =O. Similarly the estimated number of steps to convergence for any method with superlinear convergence (r > 0) will, in view of (22) be given by solving (24) for s, since q is the sum of a geometric (23), yields log r
SN N [
Hence the Gauss-Seidel
(1 + r) log Sf + log ei
+l
ratio of the number method to the number
in (1 + r). Solving (24) using
progression
II
log(l+
(25)
r).
of steps to convergence by the superlinear Newton
by the linear method is given
by
SG
(log T/log 8:) log( 1+ r)
(26)
Ps=S,=logr/[(l+r)log~~+loge~]+l~ To evaluate (26) we use the fact that Newton’s method. 3.3. A Newton switching
r =0 in Gauss-Seidel
and
r = 1 for
device
The switching device can now be constructed the composite iteration
as follows. At each stop define
Y 6’ = w,y$’ + (1 - w,)yj;s),
(27)
with initial conditions yt) = yg) = y(O), where y:’ and yl;“’ are the current iterates from y’“- ‘) by the Newton and Gauss-Seidel methods, respectively. Set w1 = 1, and thereafter
a,
ws=l>
a’= ”
[
.Pi 2n3 + 8n2 + 3n
1.
(28)
A. Hughes Hallett et al. /Journal
of Economic
Dynamics and Control 20 (1996) 1051-1071
1063
We have used the fact that each step of the Gauss-Seidel algorithm involves C ni arithmetic operations and 2n3 + 80’ + 3n in the Newton algorithm (Hughes Hallet and Fisher, 1990). If, by reordering the equations, the Newton algorithm can be confined to the feedback set of a simultaneous block - with a two-part iteration as described at the end of Section 2.2 - then tl, in (28) will be replaced by
)I
2p3 + 8~’ + 3~ + mop ni i=l
,
(29)
since each step of the Newton algorithm plus the remaining simultaneous block variables costs 2p3 + 8p2 + 3p + CY,“ni operations. Finally, to rule out any dependence on a divergent algorithm and to switch to Gauss-Seidel as soon as its computational burden falls below that of Newton, we must impose the following extra conditions on w,: w, =
1
if S,” 2 1 (i.e., Gauss-Seidel is diverging), if
w, I 4
by (28) (i.e., Gauss-Seidel
is now cheaper) 1’ (30)
w, =0 f or if 3.4.
A switching
S,” 2 1
(i.e., Newton is diverging)
device for second-order
linear algorithms
Finally, if we are to use a second-order method instead of Newton in order to obtain global convergence, we have only to replace ps in (26) with log z/log d: SG Ps=S,=logr/log6:“=log6P
log&
(31)
since both algorithms have linear convergence and hence r =O. We must also replace (28) with
%=ps
1%
[
Cni+n
1
and
We=&,
(32)
where ps is given by (31) rather than (26). The Cni + n term in (32) is number of arithmetic operations in one step of the second-order algorithm specified in Eqs. (19) or (20), ignoring any eigenvalue evaluations. The boundary restrictions on wsr given in (30), still apply however.
1064
A. Hughes Halleit et al. 1 Jourhal of Economic Dynamics and Control 20 (1996) IO51-1071
4. Implementation Two further modifications were made to the underlying Newton and secondorder algorithms in order to improve their convergence properties in the numerical exercises which follow. These modifications are not an inherent part of either algorithm, but were found useful in this particular exercise and might be employed more widely. The first is that the ordinary Newton algorithm, (3), failed to converge in our test problem. To extend its radius of convergence - so that we can actually use a Newton switching algorithm and compare it to the second-order switching algorithm - we have replaced the Newton phase by a Newton-Marquardt version. In practice this means adding a diagonal matrix to F@i) in (3), to yield Y
(s) = y’“-
1) _ [p
1) +
4sJ]
-
lf(y’“-
l),
z),
(33)
where & is a parameter to be set by the user. We chose & = 1 - &exp( - dls) where 6,, = 0.998 and 6i =O.OOl. Thus & =0.002, and & rises slowly to unity (unless switched to Gauss-Seidel) as recommended by Nash (1979). The reason for this was that Nash shows that the rate of convergence of the NewtonMarquardt algorithm rises as #s --i 1 from below, but is slower for smaller tps values. But if #s is very small, we have approximately the superlinear convergence of Newton methods, although this will also be rather slow if we start outside the set C and Newton methods are not convergent. So we start with small & values to search for a convergent sequence, and then speed things up by increasing 4s until we switch out of the Newton algorithm. This modification makes no material difference to the computational burden or the method for calculating the number of steps to convergence SN. The second innovation was to replace the path-dependent acceleration parameter ys in our simplified second-order scheme (20) by a constant value of ys = 0.85 for s 2 1. This was done because accurate eigenvalue evaluations at every step s in order to evaluate w, 6, and G* properly via (19) - or even p(G) accurately in the approximation (20) - would be extremely expensive and would destroy any advantage which this switching algorithm might have. Instead we tried to use (20) with p(G) estimated automatically from past iterates as follows: Ps+1 =
[I 5 IY~'lY:s-l'II-l,
(34)
where ‘ + ’ is used if (y!“’ - y!“- ‘))(yy- ‘) - yyw2’) < 0, but ‘ - ’ otherwise, and i denotes the element which ihanged most at step s. This is ‘rule 1’ taken from Hughes Hallet (1985). But, as so often happens if the power series method is used to estimate the spectral radius which underlies (34) in only a few iterations, the
A. Hughes Hallett et al. / Journal of Economic Dynamics and Control 20 (1996) 1051-1071
1065
acceleration parameter becomes unstable and a poor estimate of what is needed to ensure convergence at each step. In fact, in our case it was sufficiently unstable to prevent convergence. The usual recommendation, in that case, is to smooth the acceleration parameter. One way of doing that is to replace ps with a constant value representing the average value obtained from (34).
5. A numerical illustration: Consumption
and the optimal rate of oil extraction
5.1. The model to be solved
Suppose society has a welfare function to be maximised,
T-1 U(C,)
max w=,zo (1 +
(35)
r)”
where C, is global consumption, U is the utility function: U(C,) = log C, , r is the discount rate, T is the optimisation period. The constraints on the maximisation are S !+I K t+ I
(36)
St - R,,
=
=
K + F(K, Rt) - G,
t = 0, 1, . . . , T -1,
(37)
and ST = KT =O, where S,, and K0 are given. In this model, S, is the stock of oil, R, is the oil extraction, K, is the capital stock, F(K,, R,) = FIKF.~RF.’ is the Cobb-Douglas type production function, and A is a parameter. In this simple model, we assume there is no extraction cost or capital depreciation. We also assume there will be a backstop technology to replace the oil as the energy input for the production at the end of optimisation period: t = T. So we set ST = 0. The capital stock K, associated with the oil input will be useless for the backstop technology and therefore will be consumed completely, i.e., K T = 0. Implicitly, we have assumed a new type of capital stock will be built up out of the consumption C,. So the consumption C, in our model should be interpreted as the actual consumption plus the new capital stock. However, the problem of how much consumption should be saved to build this new capital stock remains unspecified. This is a discrete time model. We consider a ten-period (t =O,l, . . ,9) time horizon in blocks of ten years starting from 1990 (t =O). For start values, the world oil reserve is estimated as So = 11.5 (100 billion barrels).
A. Hughes Hallett et al. /Journal
1066
of Economic Dynamics and Control 20 (1996) 1051-107I
The world capital stock is K,, =4.913
(10 trillion US$).
The parameter A in the world production function is A =3.968,
which is estimated under the assumption that the world aggregate output over 1980s was 179.3 trillion US$ and the aggregate oil extraction is 212.7 billion barrels. We also assume the discount rate I = 5%. Now, we try to solve the above dynamic optimisation problem by the Maximum Principle. Define the current value of Hamiltonian as H, = au(G)
+ P,+ I( - R,) + A+ I(F(K,, R,) - C, + K),
(38)
t = 0, 1, . . . , T -1, where P, is the shadow price for the oil and Atis the shadow price for the capital. The necessary conditions are P ,+ 1 - P, = r~, =s.
(aff,/as,) = r~,
P t+l =(l+r)P,,
1 t+ 1 - 1, = rl, - WWW =a
At+l
= ((1 + YYU
t=
1, . . . . T,
= rA - A+ IFK,K,
(39) RJ
t = 1, . . . , T,
+ FK,)) 4,
(40)
where Fx, = FK,(Kt, R,) = aF(K,, R,)/CIK,,
BH,/BR,=O=-p,+l+I,+IFR,,t=O,l,
,T--1,
(41)
where FR, = aF(K,, R,)/aR,, -
aH,/ac, =o = uyc,)-A,+~,t
where V’(C) = XJ(C,)/aC,.
=o,i,
,T-1,
(42)
A. Hughes Halletr et al. / Journal of Economic Dynamics Table 1 The ten-period
solution
K,
3.37342 13.6819 45.3301 137.2777 383.6060 997.3350 2430.1080 5584.8970 12174.5621 25298.4825
C,. K,, and S, are measured barrels.
Thus, in summary, P, = (l+
1067
of Eqs. (43) to (48)
C# 1990 2000 2010 2020 2030 2040 2050 2060 2070 2080
and Control 20 (1996) IO.!-1071
4.9130 19.2306 66.3834 203.8886 560.4334 1381.1982 3038.6624 5870.2742 9502.4897 10936.5854
p,,.,,
R,
S
2.2770
11.5000 9.2230 7.2283 5.5051 4.04 I 3 2.8247 I .8423 1.0812 0.5287 0.1724
1.9947 1.7232 1.4637 1.2167 0.9824 0.7611 0.5525 0.3564 0.1724
in terms of trillions
of US dollars,
we have to solve the following
r)P,-,,
0.2678 0.073 1 0.0221 0.0073 0.0026 0.0010 0.0004 O.OQO2 0.0001 2.5E -5
0.2123 0.2229 0.234 I 0.2458 0.258 I 0.2710 0.2845 0.2987 0.3137 0.3293
and R, in hundreds
of billions
simultaneous
equations:
of
(43)
A = lItI+ Ml + FK,W-~, R,-A)ILI,
(44)
R, = (A+ I/P,+ 1).(1- 4F(K, R,),
(46)
+ F(LI,R,-I)
(47)
K = LI together
with a recursive
equation
- &I, in stocks,
S, = S,_, - R,_l.
(48)
That implies a five-equation system which is highly nonlinear. But, because of the mix of forward-looking variables and lagged variables, we have to solve for all five endogenous variables in all ten periods simultaneously. In order to do that we stack the endogenous variables over the ten variables, in the manner of Fisher and Hughes Hallett (1988). That produces a 50-equation system with the solution given in Table 1. To achieve that solution we have experimented with both a Newton switching algorithm and with a second-order switching algorithm. In the Newton case we can save consumption by restricting the simultaneous equation solution to just a feedback block of two equations (those in K, and A,) per time period. That implies a computational burden of 2p3 + 8~’ + 3~ + C ni = 19390 i+p
1068
A. Hughes Ha&t
et al. /Journal
of Economic Dynamics and Control 20 (1996) 1051-1071
per step, where p is the number of feedback variables, and we use the same system of iteration counts as we did for the complete system. Similarly, with 50 equations, a standard Gauss-Seidel (linear first-order) algorithm would cost
operations per step. The corresponding C ni + n = 180 operations per step.
second-order technique would require
5.2. The computational burdens compared Given that, the number of iterations to convergence, with the optimal switching mechanisms defined by (26), (29), (30) and by (31), (32), (30) respectively, are given in Table 2. The starting values were given above, except that an arbitrary value of C, = 5 (t = 0, 1, . . . ,9) and R, as a linear declining function over time, were taken to start the algorithm off. The results show that the convergence can be achieved rather efficiently even when the underlying Gauss-Seidel algorithm is divergent. The standard Newton algorithm is also divergent, but the secondorder iteration is convergent even in the approximate form, (20). It is also clear that we would best switch into the slower but cheaper Gauss-Seidel algorithm comparatively early on. This is because of the extreme cost of the Newton iterations. Table 2 shows that Newton supplies only 12% of the iterations, but costs 95% of the total workload, in our Newton switching algorithm. In contrast, the second-order algorithm supplies 12% of the iterations, but just 2% of the workload in our second-order switching algorithm. In that case, the early switch is due to the relatively slow convergence of the second-order iteration component. Nevertheless it is convergent, whereas the standard Newton algorithm was not. Hence the relative success of the second-order algorithm is the success of a slow but steady Volkswagen vs. the highly strung
Table 2 The number
of iterations
and work load by Newton
Newton
No. of iterations Operation count
and second-order
switching
switching
algorithms
Second-order
switching
FGS phase
Total
2565 333450
338850
NewtonMarquardt phase
FGS phase
Total
Secondorder phase
59 1144010
437 56810
1200820
30 5400
A. Hughes Hallett et al. 1 Journal of Economic Dynamics and Control 20 (1996) IO51-1071
la69
workload
14
Fig. 1. Workload hybrid algorithm.
12
10
8
6
4
2
0
-2
-4
-6
-8
-10
with the switch point set as 11y’“’ - y* I/2 < lo’, for different values of k
k
Newton
but vulnerable performance of a Ferrari7 And the results are clear to see. The overall workload of the second-order switching algorithm is much lower than that of the Newton algorithm. Including optimal switching, the workload of the switching algorithm is just 28%, or 3.6 times lower, than the workload of the Newton algorithm. 5.3. Sensitivity to the switch point As a final conclusion, our results also show that good algorithm design is critical as convergence and the workload appears to be sensitive to the choice of switch point. If the optimal formulae are not used, Fig. 1 shows that to switch too late causes a steady build-up in the computational costs. But to switch too early would cause a dramatic explosion in costs because of the nonconvergence (or slow convergence) of the underlying Gauss-Seidel iterations.
‘Table 2 might suggest that the second-order algorithm was pretty fast too it needs only 30 iterations to Newton’s 59. But this is only because it switches much earlier; the FGS workload to convergence is much higher under second-order switching than under Newton switching. In view of (31) and (32). this must be because the rate of convergence is broadly similar to the slower FGS method.
1010
A. Hughes Hallett et al. /Journal
of Economic Dynamics and Control 20 (1996) 1051-1071
6. Conclusion
This paper has shown how hybrid algorithms can be used both to extend the radius of convergence of conventional solution algorithms in difficult cases and to reduce to computational burden of the overall solution process at the same time. That is an important contribution since algorithms with wide convergence radii tend to be slow or extremely expensive, while those that are cheap or fast tend to be rather sensitive to conditions of nonconvergence (especially in strongly nonlinear systems) and hence to the choice of start values. Hybrid algorithms with automatic switching therefore confer substantial practical advantages. We illustrated the computational savings when solving an intertemporal optimal resource extraction problem, and showed that the computational burdens can fall by as much as a factor of 4 compared to the best that the individual component algorithms offer. But our most important contribution is the ability to extend the radius of convergence of our standard solution techniques automatically - that is without intervention by the user for the awkward and highly nonlinear models which emerge from economic theory.
References Becker, R. and B. Rustem, 1993, Algorithms for solving nonlinear dynamic decision models, Annals of Operation Research 44, 117-142. Dennis, J.E. and J.J More, 1977, Quasi-Newton methods: Motivation and theory, SIAM Review 19, 46-89. Don, F.J.H. and G.M. Gallo, 1987, Solving large sparse systems of equations in econometric models, Journal of Forecasting 6, 167-180. Duff, I.S., A.M. Erisman, and J.K. Reid, 1986, Direct methods for sparse matrices (Oxford University Press, Oxford). Fisher, P.G. and A. Hughes Hallett, 1988, An efficient strategy for solving nonlinear rational expectations models, Journal of Economic Dynamics and Control 12, 635-657. Gabay, D., P. Nepomiastchy, M. Rachdi, and A. Ravelli, 1980, Numerical methods for the simulation and optimal control of large scale macroeconomic models, in: A. Bensouran, P. Kleindorfer, and C. Tapiero, eds., Applied stochastic control in econometrics and management science (North-Holland, Amsterdam). Hughes Hallett, A., 1984a. Simple and optimal extrapolations for first order iterations, International Journal of Computer Mathematics 15, 309-318. Hughes Hallett, A., 1984b, Second order iterations with guaranteed convergence, Journal of Computation and Applied Mathematics 10, 285-291. Hughes Hallett, A., 1985, Techniques which accelerate the convergence of first order iterations automatically, Linear Algebra and its Applications 68, 1155130. Hughes Hallett, A., 1986, The convergence of accelerated over-relaxation iterations, Mathematics of Computation 47, 219-223. Hughes Hallett, A. and P.G. Fisher, 1990, On economic structures and model solution methods, Oxford Bulletin of Economics and Statistics 52, 317-330.
A. Hughes Halleit
et al. J Journal
of Economic
Dynamics and Control 20 (1996) 1051-1071
1071
Levy, H. and D.W. Low, 1988, A contraction algorithm for finding small cycle cutsets, Journal of Algorithms 9, 470. Manteuffel, T.A., 1977, The Tchebychev iteration for nonsymmetric systems, Numerische Mathematik 28, 3077327. Nash, J.C., 1979, Compact numerical methods for computers (Adam Hilger, Bristol). Nepomiastchy, P. and A. Ravelli, 1978, Adapted methods for solving and optimising quasi-triangular models, Annals of Economic and Social Measurement 6, 555. Ortega, J.M. and W.D. Rheinboldt, 1970, Iterative solution of nonlinear equations in several variables (Academic Press, New York, NY). Ostrowski, A., 1966, Solutions to equations and systems of equations. 2nd ed. (Academic Press. New York, NY). Young, D.M., 1971, Iterative solution of large linear systems (Academic Press, New York, NY).