Applied Numerical Mathematics 55 (2005) 32–47 www.elsevier.com/locate/apnum
Implicit and adaptive inverse preconditioned gradient methods for nonlinear problems Jean-Paul Chehab a,∗ , Marcos Raydan b,1 a Laboratoire de Mathématiques Paul Painlevé, UMR 8524, Université de Lille 1, France Laboratoire de Mathématiques, CNRS, UMR 8628, Equipe ANEDP, Université Paris Sud, Orsay, France b Departamento de Computación, Facultad de Ciencias, Universidad Central de Venezuela, Ap. 47002, Caracas 1041-A, Venezuela
Available online 26 November 2004
Abstract Numerical schemes for approximating the inverse of a given matrix, using an ordinary differential equation (ODE) model, were recently developed. We extend that approach to solve nonlinear minimization problems, within the framework of a preconditioned gradient method. The main idea is to develop an automatic and implicit scheme to approximate directly the preconditioned search direction at every iteration, without an a priori knowledge of the Hessian of the objective function, and involving only a reduced and controlled amount of storage and computational cost. The new scheme allows us to obtain asymptotically the Newton’s direction by improving the accuracy in the ODE solver associated with the implicit scheme. We will present extensive and encouraging numerical results on some well-known test problems, on the problem of computing the square root of a given symmetric and positive definite matrix, and also on a nonlinear Poisson type equation. 2004 IMACS. Published by Elsevier B.V. All rights reserved. Keywords: Unconstrained minimization; Preconditioned gradient method; Inverse preconditioners; ODE preconditioners; Roots of matrices; Nonlinear Poisson equation
* Corresponding author.
E-mail addresses:
[email protected] (J.-P. Chehab),
[email protected] (M. Raydan). 1 This author was supported by Fonacit Project UCV-97003769.
0168-9274/$30.00 2004 IMACS. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.apnum.2004.10.004
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
33
1. Introduction Consider the unconstrained optimization problem: min f (x),
(1)
x∈n
where f : n → is continuously differentiable and its gradient is available. We are mainly interested in large-scale problems for which the Hessian of f is either not available or requires a prohibitive amount of storage and computational cost. The well-known Newton’s method for solving (1) can be written as xk+1 = xk − Hk−1 gk ,
(2)
where gk = ∇f (xk ), Hk = H (xk ), and H (x) is the Hessian of f evaluated in x. If Hk = λk I , where λk > 0 is the minimizer of f along the direction −gk , then (2) reduces to Cauchy [4] or steepest descent method. If Hk = λk I , and λk =
t yk−1 sk−1 , t sk−1 sk−1
where sk−1 = xk − xk−1 and yk−1 = gk − gk−1 , then (2) becomes the Barzilai and Borwein [2] also known as the spectral gradient method [17,18]. See also [7,10,11] for additional results on this topic. Recently [15] a preconditioned scheme, associated to the gradient direction, was proposed to solve (1). Roughly speaking the iterates are given by xk+1 = xk − αk−1 zk ,
(3)
where zk = G−1 k gk , Gk is a nonsingular approximation to H (xk ), and the scalar αk is given by αk = (−αk−1 )
t zk−1 yk−1 . t zk−1 gk−1
(4)
In (3), if Gk = I for all k, then zk = gk , αk−1 coincides with the spectral gradient step length, and so the method reduces to the spectral gradient method. On the other hand, if the sequence of iterates converges to x ∗ , and we improve the quality of the preconditioner such that G(xk ) converges to H (x ∗ ) then, as we will discuss later in Section 3, αk tends to 1 and we recover Newton’s method, which possesses fast local convergence under standard assumptions [8,9,16]. In that sense, the iterative scheme (3) is flexible and allows intermediate options, by choosing suitable approximations Gk , between the negative gradient direction and the Newton’s direction. Interesting numerical results were obtained in [15] on different applications, but a preconditioner especially designed had to be developed in connection with each particular application (e.g., nonlinear Poisson-type equations and multidimensional scaling problems). In some cases, it is possible to find and build a reasonable preconditioner that approximates the Hessian of f near the solutions. Unfortunately this is not possible in many other applications. In general, problemdependent preconditioners require a deep knowledge of the problem by the user, which is not always available. Moreover, problem-dependent preconditioners do not necessarily converge to the Hessian near the solutions when a certain approximation factor is improved, or it could also be the case that Gk becomes dense when it approaches the exact Hessian. Consequently, the Newton direction is never reached, even if a significant effort and computational cost is allowed in the iterative process.
34
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
We will extend the recent work by Chehab [5] for approximating the inverse of a given matrix (see also [6]), using an Ordinary Differential Equation (ODE) model, to the nonlinear problem (1) within the framework of the iterative preconditioned scheme (3). Our plan is to develop an automatic and implicit scheme to approximate directly the preconditioned direction zk at every iteration, without an a priori knowledge of the Hessian of f , and involving only a reduced and controlled amount of storage and computational cost. As we will discuss in Section 3, this new scheme avoids as much as possible the cost of any calculations involving matrices, and will also allow us to obtain the Newton’s direction by improving the accuracy in the ODE solver associated with the implicit scheme (see, e.g., [3]), to be discussed also in Section 2. In Section 4 we will present extensive and encouraging numerical results on some well-known test problems, on the problem of computing the square root of a given symmetric and positive definite matrix, and also on a nonlinear Poisson type equation.
2. Motivation for the new preconditioning scheme 2.1. Modeling of the problem The derivation of a numerical scheme for solving a given problem is usually governed by the way the problem is modeled mathematically. Therefore, the construction of a numerical method can be achieved in two steps: • Modeling of the problem • Discretization/numerical solution This approach applies of course to minimization problems like (1). Assume for simplicity that f possesses a unique minimum x ∗ ∈ Rn . It is classical to obtain x ∗ by imposing the first order necessary conditions, i.e., by computing a zero of ∇f . At this point we can consider two different points of view: 2.1.1. The geometrical point of view We consider the graph of f (an n-dimensional manifold z = f (x)) and define x ∗ as the local minimum; x ∗ is computed iteratively by taking a descent direction. More precisely, we generate the sequence xk+1 = xk − αk zk , (5) x0 given, where αk is the descent step length and zk is the descent direction. If zk is collinear to gk = ∇f (xk ) we obtain a simple gradient scheme. If zk Hk−1 gk , where Hk is the Hessian at xk , we obtain a preconditioned gradient scheme, most of the methods can be defined following this stencil. 2.1.2. A dynamical system point of view We consider x ∗ as satisfying the steady state of a dynamical system, the most simple being the following gradient flow (see, e.g., Helmke and Moore [12]) dx(t) = −∇f x(t) , t > 0, dt (6) x(0) = x0 .
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
35
Of course the application of an explicit time marching scheme to the above system generates an iterative process identical to that of a simple gradient method. The convergence can be speeded up by preconditioning the dynamical system for which we give now the derivation. The preconditioned gradient method can be obtained by applying a forward Euler method to the so-called continuous Newton method (see, e.g., Alvarez et al. [1] and references therein)
−1 = −H x(t) ∇f x(t) , x(0) = x0 . dx(t) dt
t > 0,
(7)
Remark 1. The linearized equation at the solution x ∗ is x = x ∗ + δ with dδ = −δ, dt
(8)
so x ∗ is asymptotically stable. Notice that in this case, the steepest descent method selects automatically t = 1 as the time step, which is nothing else but the step of the Newton method. Now, if we set z(t) = H (x(t))−1 ∇f (x(t)), we obtain the system dx(t) = −z(t), t > 0, dt H x(t)z(t) − ∇f x(t) = 0, x(0) = x0 , z(0) = z0 .
(9)
The preconditioning equation H (x(t))z(t) − ∇f (x(t)) = 0 appears as an algebraic constraint. Once again, if we apply a forward Euler scheme to this system, we recover the stencil of the preconditioned gradient methods, the preconditioning equation being exactly or approximatively solved at each step. It is however natural to relax the preconditioning constraint by introducing the temporal variations of z(t) as dx(t) = −z(t), t > 0, dt ε dz(t) = ∇f x(t) − H x(t)z(t), dt x(0) = x0 , z(0) = z0 .
(10)
Here ε > 0 is a given real number. Of course the steady state of this last system, (x, z) = (x ∗ , 0), is nothing else but the solution of ∇f (x) = 0, so the consistency of the problem remains unchanged. Remark 2. The relaxation of the constraint we propose here is similar to the classical one used for solving incompressible Navier–Stokes equations: the compressibility condition div(u) = 0 is replaced by + div(u) = 0 (the so-called artificial compressibility equation), see [20]. ε ∂p ∂t
36
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
In order to increase the versatility of the approach, system (10) can be further extended as follows. Let T : n → n be a regular (at least C 1 ) function such that T (0) = 0. We consider the system dx(t) = −z(t), t > 0, dt dz(t) ε dt = T ∇f x(t) − H x(t) z(t) , (11) , x(0) = x 0 z(0) = z0 . Clearly, we recover system (10) when T is the identity map, I , and (x, z) = (x ∗ , 0) is again the steady state of the system. This new formulation allows us to develop semi-implicit schemes (see [3]), carrying more stability into the framework. Before we establish our convergence result, we need the following lemma. Lemma 1. Let M be a square matrix with real and positive eigenvalues. Then, all the eigenvalues of the block matrix
0 −I J= M −M have negative real part. Proof. Let w = (u, v)t be an eigenvector of J and λ the associated eigenvalue. We obtain the relations v = −λu,
Mu − Mv = λv,
so λ2 u. 1+λ The vector u is hence an eigenvector of M and we have Mu = −
λ2 , µ ∈ SP(M), 1+λ where SP(M) denotes the spectrum of the matrix M. From the above equation, we infer that −µ ± µ2 − 4µ λ= . 2 Now let µ0 be the smallest eigenvalue of M. If 1 µ40 , then λ is real and strictly negative. If µ40 < 1, then (λ) = − µ2 < 0, and the result follows. 2 µ=−
At this point we establish the following convergence result. Proposition 1. Assume that f is twice continuously differentiable in an open neighborhood of the local strict minimizer x ∗ . Then (x ∗ , 0) is an asymptotic stable equilibrium point of (10), for all ε > 0. Proof. Following the classical approach, see, e.g., [14], we show that the eigenvalues of the Jacobian matrix J of the vector field of (10), at the steady state have all strictly negative real part. We have
0 −I . J= 1 H (x ∗ ) − 1ε H (x ∗ ) ε
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
37
Since H (x ∗ ) is symmetric and positive definite, then by Lemma 1 all the eigenvalues of J have negative real part. Therefore, the application of the classical stability theorem guarantees that there exists a neighborhood V of (x ∗ , 0) such as for any initial condition in V , we have 2 lim x(t), z(t) = (x ∗ , 0). t→+∞
By similar arguments, under mild assumptions on T , we can establish the same result for the convergence of system (11). The asymptotic stability of (x ∗ , 0) allows us to compute this steady state numerically by implementing an explicit time scheme, as proposed below. 2.2. Gradient method with a preconditioning constraint We now propose to combine the two preceding approaches: a time marching scheme is used for solving (approximatively) the preconditioning equation (10) with ε = 1, while the gradient-like method is used for updating x. This gives rise to the general iterative scheme: Step 1: Computation of zk Compute approximatively a steady state of dz(t) = ∇f (xk ) − H (xk )z(t), z(0) = zk−1 , (12) dt Step 2: Updating of xk x =x −α z . k+1 k k k The scheme is completely defined once αk and step 1 are given (see Section 3). Remark 3. The way in which the preconditioning will be implemented depends on the knowledge of the Hessian matrix at each step. We distinguish three basic situations: (1) The computation of Hk , ∀k 0, is costly or we do not have any knowledge of the matrix Hk . In this case we approximate directly the vector Hk z by ∇f (xk + hz) − ∇f (xk ) , ∀z ∈ Rn , |h| 1. h (2) A preconditioner can be built for H0 , but Hk is difficult to build. The preconditioning will be then a constant one. (3) Hk can be built and stored at a low cost per iteration. If an inverse preconditioner G0 of H0 is known, the preconditioning will consist in updating Gk at each step. Gk z =
3. The new schemes 3.1. Time marching schemes The efficiency of the preconditioning step depends on the quality of the approximation of the steady state and on the amount of computation needed. Hence, a good choice for the time marching scheme is one which allows us to take large time steps, the accuracy in time being not a priority; in numerical
38
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
analysis, this is a large time stability. The computational effort is of course limited when the scheme is explicit. To insure both the explicity of the scheme and its stability, the steady state must be asymptotically stable. We will assume it in the sequel for simplicity. However there is no loss of generality in such an assumption since it is possible to transform the preconditioning equation such that the steady states are stable, see [3]. In [3], it was proposed to compute fixed points F (x) = x as steady states of dx = F (x) − x = M(x), dt (13) x(0) = x0 , by using the enhanced forward scheme: K1 = M(xk ), K2 = M(xk + tK1 ), xk+1 = xk + t αK1 + (1 − α)K2 .
(14)
Here α is a parameter to be fixed. This scheme allows a larger stability as compared with the Forward Euler scheme, see [6]. Of course, one can define iteratively α and t such as minimizing the Euclidean norm of the residual, exactly as in the steepest descent method. For example, if M(x) = b − Ax, where A is a n × n positive definite matrix and b a given vector in Rn , we have tk =
f b − ed , f c − e2
eb − cd αk = f c − e2 . (f b − ed)2
(15)
We have set rk = b − Axk and b = Ark , rk , c = Ark 2 , a = rk 2 , f = A2 rk , A2 rk . e = A2 rk , Ark ,
d = A2 rk , rk , (16)
Such a scheme can be applied to the preconditioning equation, where the product matrix vectors are approached by a finite difference scheme, namely Ark
∇f (xk + hrk ) − ∇f (xk ) = Gk rk , h
A2 rk
∇f (xk + hGk rk ) − ∇f (xk ) . h
(17)
3.2. Description of the new schemes In the algorithms that result from the combination of the time marching and of the gradient method, we can distinguish the outer iterations, that correspond to the gradient iterative process, from the inner iterations, that correspond to the computation of the (local, in time) steady state (see (12)). For the sake of simplicity, we will present all the scheme assuming that T = I . Scheme 1. Outer iterations: preconditioned spectral gradient method, Inner iterations: steepest descent method.
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
The resulting scheme is Solution of the preconditioning equation 0 Set zk−1 = zk−1 , j=0 While j < N1 and rj < η rj = ∇f (xk ) − Gk rj =
∇f (xk +hrj )−∇f (xk )) h
set tj = j +1
j
∇f (xk +hzk−1 )−∇f (xk )) h
j
rjt Gk rj (Gk rj )t Gk rj
zk−1 = zk−1 + tj rj j →j +1 Set Computation of xk+1 Compute using (4) Update xk
j
zk = zk−1 αk xk+1 = xk − αk zk
Scheme 2. Outer iterations: preconditioned spectral gradient method, Inner iterations: enhanced steepest descent method. The resulting scheme is Solution of the preconditioning equation 0 Set zk−1 = zk−1 , j=0 While j < N1 and rj < η rj = ∇f (xk ) −
j
∇f (xk +hzk−1 )−∇f (xk )) h
Gk rj =
∇f (xk +hrj )−∇f (xk )) h
G2k rj =
∇f (xk +hGk rj )−∇f (xk )) h
Compute a, b, c, d, e, f by (16) using (17) set tj =
f b−ed , f c−e2
λj = (f c − e2 ) (feb−cd b−ed)2
Set K1 = rj , K2 = rj − tj Gk rj j +1
j
Set zk−1 = zk−1 + tj (λj K1 + (1 − λj )K2 ) j →j +1 Set Computation of xk+1 Compute using (4) Update xk
j
zk = zk−1 αk xk+1 = xk − αk zk
39
40
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
Finally, since the marching procedure for building the preconditioner is based on the steepest descent method, then it is also possible to use the spectral gradient method internally, as described in our next scheme. Scheme 3. Outer iterations: preconditioned spectral gradient method, Inner iterations: spectral gradient method. The resulting scheme is Solution of the preconditioning equation 0 Given t0 > 0, set zk−1 = zk−1 , j=0 While j < N1 and rj < η rj = ∇f (xk ) − Gk rj =
∇f (xk +hrj )−∇f (xk )) h
set tj = j +1
j
∇f (xk +hzk−1 )−∇f (xk )) h
j
rjt −1 rj −1
rjt −1 Gk rj −1
zk−1 = zk−1 + tj rj j →j +1 Set Computation of xk+1 Compute using (4) Update xk
j
zk = zk−1 αk xk+1 = xk − αk zk
It is worth mentioning that if the preconditioner is chosen such that Gk tends to H (x ∗ ), where x ∗ is the solution to which the sequence {xk } converges, then the step length sequence αk tends to 1. This fact supports the comment, in the introduction, that the SPG method, in the presence of a good preconditioner, tends to reproduce asymptotically Newton’s method in the neighborhood of x ∗ . Indeed, the SPG method can be written as xk+1 = xk − αk−1 G−1 (xk )gk , which tends to Newton’s method locally if G(xk ) ≈ H (xk ) and αk ≈ 1. The convergence of αk is discussed in our next result. Proposition 2. Assume that the sequence generated by the SPG method converges to x ∗ , and let f be twice continuously differentiable in an open neighborhood of x ∗ . If the sequence of preconditioning matrices {Gk } converges to H (x ∗ ), then lim αk = 1.
k→∞
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
41
Proof. The vector yk−1 can be written as 1 yk−1 = gk − gk−1 =
H xk−1 + t (xk − xk−1 ) dt (xk − xk−1 ).
0 −1 zk−1 , (4) can be written as Hence, since xk − xk−1 = −αk−1 1 t zk−1 ( 0 H (xk−1 + t (xk − xk−1 )) dt)zk−1 . αk = t zk−1 gk−1
Now, using that zk−1 = G−1 k−1 gk−1 , we obtain 1 −1 t gk−1 G−1 k−1 ( 0 H (xk−1 + t (xk − xk−1 )) dt)Gk−1 gk−1 . αk = t gk−1 G−1 k−1 gk−1 1 Finally, since limk→∞ ( 0 H (xk−1 + t (xk − xk−1 )) dt) = H (x ∗ ) and 1 lim
k→∞
H xk−1 + t (xk − xk−1 ) dt G−1 k−1 = I,
0
we have that lim αk = 1.
k→∞
2
4. Numerical results We will present numerical results on some different type of problems to illustrate the behavior of our implicit and automatic preconditioned scheme. First, we will consider standard test problems that can be found in the literature and that have been extensively used to test unconstrained optimization codes. Second, we will discuss the behavior of the method when applied to the problem of finding the square root of a symmetric and positive definite matrix. Finally, we will study the behavior of the method when applied to the solution of nonlinear Poisson equations. In every case we compare the new preconditioning schemes with the spectral gradient method with no preconditioning strategy (No Precond). All the experiments were run on a PC Pentium 4 at 2.0 GHz clock and 512 MBytes RAM memory, and we use MATLAB 6.5. All runs were stopped when
gk 2 tol, where tol = 10−6 . In all the experiments we choose, for every external iteration k, h ≡ hk as follows hk = 10−10 / min 1, gk , and we verify that the two schemes (with and without preconditioning) converge to the same point.
42
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
4.1. Standard test problems We test our codes on four well-known test functions. Table 1 lists the functions and the structure of their Hessians. A description of the functions and the starting points can be found in [18] and references therein. In this case we have taken h0 = 1.D–10. We summarize on Table 2 the behavior of Schemes 1, 2, and 3, and the no preconditioning version. We have chosen ε = 10−6 as global tolerance, and the tolerance to stop the inner iterations is 10−7 . We have also fixed the number of maximal inner iterations in 10 for all experiments. We report the function and the dimension of the problem (func(n)), the number of iterations required for convergence (iter), the CPU time in seconds until convergence (time), and the number of inner iterations (inner). The symbol (∗∗∗), that appears in some cases means that the method could not solve the problem after 1000 external iterations. We observe that Scheme 2 is a very robust method to find local minimizers of nonquadratic functions. It outperforms Scheme 1 in number of iterations and CPU time in most cases. Both Schemes, 2 and 3, clearly outperform the no preconditioning strategy, except when the Hessian at the solution is wellconditioned as in the Oren’s power function.
Table 1 Standard test functions Function
Name
Hessian
1 2 3 4
Extended Rosenbrock Oren’s power Strictly convex 2 Broyden tridiagonal
tridiagonal dense diagonal tridiagonal
Table 2 Results for Schemes 1, 2, and 3 on standard test functions func(n)
No Precond
Scheme 1
iter
iter
inner
time
time
Scheme 2
Scheme 3
iter
inner
time
iter
inner
time
1(1000) 1(10000) 1(100000)
∗∗∗ ∗∗∗ ∗∗∗
∗∗∗ ∗∗∗ ∗∗∗
202 192 164
2012 1912 1632
3.35 16.31 318.0
8 8 9
67 69 82
0.27 1.08 30.11
70 79 146
694 784 1454
1.19 6.9 284.1
2(1000) 2(10000) 2(100000)
29 62 115
0.05 0.3 4.9
7 21 90
70 210 900
0.14 1.28 73.42
6 7 24
54 70 240
0.12 0.81 39.4
6 15 38
60 150 380
0.08 0.97 30.19
3(1000) 3(5000)
∗∗∗ ∗∗∗
∗∗∗ ∗∗∗
496 ∗∗∗
4960 ∗∗∗
30.41 ∗∗∗
178 964
1780 9632
3.53 44.07
236 ∗∗∗
2360 ∗∗∗
2.32 ∗∗∗
4(1000) 4(10000) 4(100000) 4(1000000)
29 31 284 ∗∗∗
0.1 0.21 21.17 ∗∗∗
8 7 8 9
80 70 72 82
0.12 0.87 12.71 131.26
5 5 5 5
49 50 49 50
0.12 0.66 13.97 154.24
7 7 8 9
70 70 80 90
0.1 0.51 11.84 126.4
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
43
4.2. Computing the square root of a matrix Consider the problem of finding X ∈ n×n such that F (X) = X 2 − A, where A ∈ n×n is a symmetric and positive definite given matrix. In this case, we are applying the method on the vector space of n × n symmetric matrices, and the matrix −F (X) is a descent direction for the merit function f (X) = F (X) 2F , and plays the role of the gradient vector (see [19] for details). This problem has interesting applications and it also has a well-developed nontrivial theory behind (see [13]). For example, there are many solutions, but there is only one that is also symmetric and positive definite. If we start with a symmetric and positive definite initial guess, and a positive initial steplength, then all the algorithms considered here will converge to that unique SPD solution [19]. Notice that the matrix F (Xk ) is a dense matrix for k 1 even when the initial guess is sparse. Therefore, the number of variables for this application is n2 . We fix ε = 10−6 as the global tolerance, the tolerance for inner iterations is again 10−7 , and h0 = 1.D–6. We start the algorithms from u0 = A and α0 = 1. We have fixed the number of maximal inner iterations in 10 for all experiments. We report in Table 3 the results for a well-conditioned symmetric Cauchy and Toeplitz matrix, that can be obtained directly in Matlab as A = Gallery(‘parter’, n), T
. and then setting A = A+A 2 In our second experiment the given matrix A is chosen as the one obtained from the standard 5 point 2D finite differences discretization of the Laplacian operator on the unit square. The results are reported in Table 4. This matrix is obtained directly in Matlab as √ A = Gallery ‘poisson’, n . Table 3 Results for Schemes 1, 2, and 3 on F (X) = X ∗ X − A, when A is the Parter matrix n2 400 10000 40000 160000
No Precond
Scheme 1
Scheme 2
Scheme 3
iter
time
iter
inner
time
iter
inner
time
iter
inner
time
74 168 349 ∗∗∗
0.25 2.01 25.03 ∗∗∗
6 10 11 15
60 100 110 150
0.07 1.08 7.67 79.84
5 5 6 7
47 50 60 70
0.09 1.18 9.68 86.5
6 10 10 12
60 100 100 120
0.07 1.06 7.02 64.48
Table 4 Results for Schemes 1, 2, and 3 on F (X) = X ∗ X − A, when A is the Poisson matrix n2 10000 160000 810000
No Precond
Scheme 1
iter
time
iter
inner
time
iter
Scheme 2 inner
time
iter
Scheme 3 inner
time
344 523 328
2.46 202.4 1301.7
11 18 17
110 180 170
1.17 98.0 988.7
6 7 8
59 67 80
1.35 83.4 1017.2
7 10 12
70 100 120
0.85 53.27 648.1
44
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47 Table 5 Results for Schemes 1 for the square root of the Poisson matrix when n2 = 160000 maxinner
iter
inner
time
5 10 20 30 40 50
127 18 10 7 6 6
634 180 192 210 234 288
339.4 96.0 102.8 112.5 125.5 155.8
Fig. 1. Norm of the residual (semilog) for Scheme 1 with different choices of innermax.
From Tables 3 and 4, we observe that the new schemes outperform the nonpreconditioned version. In general scheme 2 is more robust than the others, and scheme 3 is faster. In our last experiment we want to illustrate the effect of increasing the maximum number of inner iterations. For that we consider Scheme 1 on the Poisson matrix for n2 = 160000. In Table 4 we report the number of outer iterations, the number of inner iterations and the CPU time (seconds) for innermax between 5 and 50. In Fig. 1 we show the norm of the residual (semilog) for all choices of innermax. We observe that when the number of inner iterations increases, the preconditioning tends to reproduce the Newton’s direction and very fast local convergence is observed. However, this also represents an increase in the computational cost of the ODE solver. On the other hand, we observe that reducing the number of inner iterations we increase the number of external iterations, by reducing the quality of the preconditioner, and this led us to some compromise between external iterations and computational cost. For that particular experiment the optimal choice of the maximum number of inner iterations is 10.
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
45
Table 6 Results for Schemes 1, 2, and 3 on nonlinear Poisson equation n2
h0
No Precond
Scheme 1
iter
iter
in.
4
46
∗∗∗
∗∗∗
∗∗∗
3
29
163.69
time
2500
5.D–5
502
37.11
10000
5.D–5
1333
563.43
40000
1.D–6
∗∗∗
∗∗∗
Scheme 2 time 4.66
iter
in.
3
12
3
12
3
17
Scheme 3 time
iter
in.
2.27
2
26
3.20
13.23
3
31
21.07
3
31
148.35
158.8
time
4.3. A nonlinear Dirichlet problem Consider the problem ∂ ∂ ∂u k(u) ∂x − ∂y k(u) ∂u = f (x, y) − ∂x ∂y u|∂Ω = 0
in Ω = ]0, 1[ 2 , on ∂Ω.
(18)
Here k(u) is a nonlinear function of u, and we assume that (18) has a unique solution. For k(u) = u2 + 1, it is easy to show that there exists indeed a unique solution for problem (18). We take f (x, y) such as the solution is u∗ (x, y) = x(1 − x)y(1 − y). The problem is discretized by second order finite differences on a regular mesh on the unit square; there are n internal grid points per direction of the domain with x = 1/(n + 1) as spatial step size. Therefore the numerical solution of (18) can be obtained by solving a nonlinear system of n2 equations, see also [15]. Unlike the previous examples, we implemented the numerical schemes choosing T = I (see (11)). Here, the nonlinear system to be solved can be written in the following way: Ax + b(x) = 0, where A is a regular (fixed) matrix, namely the classical pentadiagonal discretization matrix of the Laplacian. Hence, we have chosen T = A−1 . This choice requires a linear system to be solved at each internal iteration, but it can be accomplished by any classical solver (e.g., preconditioned conjugate gradient method, multigrid method, Cholesky). In Table 6 some results are summarized. We have chosen ε = 10−6 as the global tolerance and the tolerance for inner iterations is again 10−7 . The initial guess is chosen as u0 = 0.6u∗ . We have fixed the number of maximal inner iterations in 20 for all values of n. For this experiment, the symbol (∗∗∗) means that the method could not solve the problem after 2000 external iterations. Here again, the preconditioning techniques allow us to solve the problem efficiently and, as in the previous examples, Scheme 2 is the most robust since the number of external as well as the internal iterations do not vary significantly with the dimension of the problem. For large problems Scheme 3 seems to be the fastest of the three options.
5. Concluding remarks We have presented preconditioned gradient methods that use an approximation of the Newton direction by means of an ODE internal numerical scheme. If the preconditioning scheme is not activated then we end up with the negative gradient direction. In that sense, the new schemes can be viewed as a smooth
46
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
path between Cauchy’s method with a different choice of steplength and Newton’s method. The choice of steplength used in all the schemes is the spectral choice of steplength. We would like to stress out the advantages of the fact that the spectral choice of steplength converges to 1 when the preconditioning operator approaches the inverse of the Hessian, and as a consequence the preconditioned direction tends to the Newton’s direction. If the steplength does not tend to 1, the fast local convergence will be lost. As far as we know the spectral choice of steplength is the only choice for nonquadratic functions with that property. Finally, we would like to mention that in this work we are mainly concerned with local convergence, for which preconditioning becomes an important acceleration ingredient. Of course, the spectral gradient method can be globalized, as described in [15]. However, to illustrate the power of the new schemes based on time marching acceleration procedures, the globalization strategies could be ignored to simplify the presentation. In a forthcoming research, we would like to apply these new schemes to large-scale real problems (e.g., incompressible steady Navier–Stokes equations) for which we will need to incorporate the globalization strategies.
Acknowledgements We are indebted to an anonymous referee for interesting comments and suggestions that helped us to enhance the quality of the paper.
References [1] F. Alvarez, H. Attouch, J. Bolte, P. Redont, A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics, J. Math. Pures Appl. 81 (2002) 747–779. [2] J. Barzilai, J.M. Borwein, Two point step size gradient methods, IMA J. Numer. Anal. 8 (1988) 141–148. [3] C. Brezinski, J.-P. Chehab, Nonlinear hybrid procedures and fixed point iterations, Numer. Func. Anal. Optim. 19 (1998) 465–487. [4] A. Cauchy, Méthodes générales pour la résolution des systèmes d’équations simultanées, C. R. Acad. Sci. Paris 25 (1847) 536–538. [5] J.-P. Chehab, Differential equations and inverse preconditioners, in preparation. [6] J.-P. Chehab, J. Laminie, Differential equations and solution of linear systems, in preparation. [7] Y.H. Dai, L.Z. Liao, R-linear convergence of the Barzilai and Borwein gradient method, IMA J. Numer. Anal. 22 (2002) 1–10. [8] J.E. Dennis Jr, R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, PrenticeHall, Englewood Cliffs, NJ, 1983. [9] R. Fletcher, Practical Methods of Optimization, Wiley, New York, 1987. [10] R. Fletcher, Low storage methods for unconstrained optimization, in: Lectures in Applied Mathematics, vol. 26, American Mathematical Society, Providence, RI, 1990, pp. 165–179. [11] R. Fletcher, On the Barzilai–Borwein method, Department of Mathematics, University of Dundee NA/207, Dundee, Scotland, 2001. [12] U. Helmke, J. Moore, Optimization and Dynamical Systems, Springer, Berlin, 1994. [13] N.J. Higham, Computing real square roots of a real matrix, Linear Algebra Appl. 88/89 (1987) 405–430. [14] M.W. Hirsch, S. Smale, Differential Equations, Dynamical Systems and Linear Algebra, Academic Press, London, 1974. [15] F. Luengo, M. Raydan, W. Glunt, T.L. Hayden, Preconditioned spectral gradient method, Numer. Algorithms 30 (2002) 241–258.
J.-P. Chehab, M. Raydan / Applied Numerical Mathematics 55 (2005) 32–47
47
[16] J.M. Ortega, W.C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [17] M. Raydan, On the Barzilai and Borwein choice of steplength for the gradient method, IMA J. Numer. Anal. 13 (1993) 321–326. [18] M. Raydan, The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem, SIAM J. Optimization 7 (1997) 26–33. [19] M. Raydan, Low-cost methods for nonlinear large-scale matrix equations, in: Y. Yuan (Ed.), Numerical Linear Algebra and Optimization, Science Press, Beijing, 2004, pp. 79–87. [20] R. Temam, Navier–Stokes Equations, Chelsea, American Mathematical Society, Providence, RI, 2001.