Journal of Computational and Applied Mathematics 234 (2010) 375–397
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
Nonlinear conjugate gradient methods with structured secant condition for nonlinear least squares problems Michiya Kobayashi a , Yasushi Narushima b,∗ , Hiroshi Yabe b a
Accenture Technology Solutions, Japan
b
Tokyo University of Science, 1-3, Kagurazaka, Shinjuku-ku, Tokyo 162-8601, Japan
article
abstract
info
Article history: Received 31 July 2008 Received in revised form 27 February 2009 Keywords: Least squares problems Conjugate gradient method Line search Global convergence Structured secant condition
In this paper, we deal with conjugate gradient methods for solving nonlinear least squares problems. Several Newton-like methods have been studied for solving nonlinear least squares problems, which include the Gauss–Newton method, the Levenberg–Marquardt method and the structured quasi-Newton methods. On the other hand, conjugate gradient methods are appealing for general large-scale nonlinear optimization problems. By combining the structured secant condition and the idea of Dai and Liao (2001) [20], the present paper proposes conjugate gradient methods that make use of the structure of the Hessian of the objective function of nonlinear least squares problems. The proposed methods are shown to be globally convergent under some assumptions. Finally, some numerical results are given. © 2010 Elsevier B.V. All rights reserved.
1. Introduction We consider conjugate gradient methods for minimizing a sum of squares of nonlinear functions min f (x) =
1 2
kr (x)k2 ,
x ∈ Rn ,
(1.1)
where r (x) = (r1 (x), . . . , rp (x))T , ri : Rn → R, p ≥ n and k · k denotes the l2 -norm. We denote a local minimizer of (1.1) by x∗ . This class of problems occurs in practical applications of nonlinear optimization. For example, a nonlinear system r (x) = 0 can be reformulated by (1.1). Several methods based on the Newton method for solving (1.1) have been proposed, which include the Gauss–Newton method, the Levenberg–Marquardt method, and the structured quasi-Newton methods. Especially, many researchers have studied structured quasi-Newton methods. Dennis, Gay and Welsch [1] proposed the structured DFP update. Dennis, Martinez and Tapia [2] derived the structure principle, and proved the local and superlinear convergence of the structured BFGS method, which was proposed in [3]. Later, Engels and Martinez [4] unified these methods, and proposed the structured Broyden family with one parameter φk and they showed the local and superlinear convergence of the method with the convex class (0 ≤ φk ≤ 1) of this family. Yabe and Yamaki [5] proved local and superlinear convergence of the method with a wider class of the structured Broyden family. In addition, Huschens [6] proposed another structured quasi-Newton method with the Huschens–Broyden family, and showed its convergence properties in the case 0 ≤ φk ≤ 1. Specifically, he showed the local and quadratic convergence for zero residual problems, and the local and superlinear convergence for non-zero residual problems. Thereafter, Yabe and Ogasawara [7] extended these results to a wider class of the Huschens–Broyden family and proved local convergence properties in a way different from the proof by Huschens.
∗
Corresponding author. E-mail addresses:
[email protected] (Y. Narushima),
[email protected] (H. Yabe).
0377-0427/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.cam.2009.12.031
376
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
In this paper, we consider conjugate gradient methods for solving (1.1). Conjugate gradient methods are iterative methods and generate the sequence {xk } by the form: xk+1 = xk + αk dk ,
(1.2)
where xk ∈ Rn is a current estimate of x∗ , αk > 0 is a step size and dk ∈ Rn is a search direction. The direction dk is recursively defined by
dk =
−gk , −gk + βk dk−1 ,
for k = 0, for k ≥ 1,
(1.3)
where gk denotes ∇ f (xk ) and βk is a scalar which characterizes the method. If f (x) is a strictly convex quadratic function and αk is the exact one-dimensional minimizer, (1.2)–(1.3) is called the linear conjugate gradient method. On the other hand, (1.2)–(1.3) is called the nonlinear conjugate gradient method for general unconstrained optimization problems. In 1952, Hestenes and Stiefel [8] originally proposed the linear conjugate gradient method for solving a linear system of equations with a symmetric positive definite coefficient matrix, or equivalently for minimizing a strictly convex quadratic function. Their method is given by (1.2)–(1.3) with
βkHS =
gkT yk−1 dTk−1 yk−1
,
(1.4)
where yk−1 = gk − gk−1 . In 1964, nonlinear conjugate gradient methods started with research of Fletcher and Reeves [9]. In their work, the parameter βk is given by
βkFR =
kgk k2 . kgk−1 k2
(1.5)
Zoutendijk [10] proved the global convergence of the Fletcher–Reeves method with the exact line search. Al-Baali [11] extended this result to inexact line searches. Sorenson [12] applied the Hestenes–Stiefel formula (1.4) to general unconstrained optimization problems. Polak and Ribière [13] gave a modification of the Fletcher–Reeves method. Their choice for the parameter βk was
βkPR =
gkT yk−1
kgk−1 k2
.
(1.6)
Powell [14] showed that the Polak–Ribière method can cycle infinitely without approaching a solution. Powell [15] suggested the following modification in the parameter βk for the Polak–Ribière method, which is called the Polak–Ribière Plus,
βkPR+ = max{βkPR , 0}.
(1.7)
Gilbert and Nocedal [16] proved the global convergence of the Polak–Ribière Plus method. Dai and Yuan [17] proposed the conjugate gradient method which generates a descent search direction at every iteration if the Wolfe conditions are satisfied, and they proved its global convergence. Their βk is presented as
βkDY =
kg k k2 dTk−1 yk−1
.
(1.8)
These conjugate gradient methods are summarized in [18]. In 1978, Perry [19] constructed a conjugacy condition by using the usual secant condition to accelerate the convergence of conjugate gradient methods and proposed a conjugate gradient method based on its conjugacy condition. Dai and Liao [20] modified Perry’s method and proved its global convergence property. Later, several conjugate gradient methods based on other secant conditions have been studied, which included Yabe and Takano [21], Zhou and Zhang [22], and Ford, Narushima and Yabe [23]. They proved the global convergence properties of their proposed methods. In this paper, following the idea of Dai and Liao, we propose new formulas for βk by exploiting the structure of the Hessian of the objective function of nonlinear least squares problems. We emphasize that conjugate gradient methods which make use of the structure of problems have been hardly studied. This paper is organized as follows. In Section 2, we state a conjugacy condition and the formulas for βk proposed by the papers [20,23,21,22]. In Section 3, the structured secant condition for nonlinear least squares problems is described, and we propose a new conjugacy condition and derive new formulas for βk . In Section 4, global convergence properties of our proposed conjugate gradient methods are shown. In Section 5, numerical experiments are presented.
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
377
2. Conjugate gradient methods based on the secant conditions Within the framework of nonlinear conjugate gradient methods, the following conjugacy condition dTk yk−1 = 0
(2.1)
is considered, because there exists some τ ∈ (0, 1) such that dTk yk−1 = αk−1 dTk ∇ 2 f (xk−1 + τ αk−1 dk−1 )dk−1 by the mean value theorem. Perry [19] tried to incorporate the second-order information into conjugacy condition to accelerate a conjugate gradient method. Specifically, he focused on the secant condition of quasi-Newton methods: Bk sk−1 = yk−1 ,
(2.2)
in which the search direction dk is obtained by solving the linear system of equations Bk dk = −gk ,
(2.3)
where Bk is an approximation to the Hessian and sk−1 = xk − xk−1 . By using (2.2) and (2.3), the relation dTk yk−1 = dTk Bk sk−1 = (Bk dk )T sk−1 = −gkT sk−1
(2.4)
holds. By taking this relation into account, Perry replaced the conjugacy condition (2.1) by the condition dTk yk−1 = −gkT sk−1 .
(2.5)
Later, Dai and Liao [20] proposed the extended condition dTk yk−1 = −tgkT sk−1 ,
(2.6)
where t ≥ 0 is a scalar. In the case t = 0, (2.6) reduces to the usual conjugacy condition (2.1). On the other hand, in the case t = 1, (2.6) becomes Perry’s condition (2.5). To ensure that the search direction dk satisfies condition (2.6), Dai and Liao substituted (1.3) into (2.6), and proposed the following βk :
βkDL =
gkT (yk−1 − tsk−1 ) dTk−1 yk−1
.
(2.7)
Furthermore, they suggested the following modification of (2.7) to show global convergence for general functions:
( β
DL+ k
= max
gkT yk−1 dTk−1 yk−1
) ,0 − t
gkT sk−1 dTk−1 yk−1
.
(2.8)
Recently, following the idea of Dai and Liao, several new conjugate gradient methods have been studied. Yabe and Takano [21] derived a conjugate gradient method based on the modified secant condition of Zhang, Deng and Chen [24] and Zhang and Xu [25]. The formula for βk is given by
( β
YT + k
= max
gkTb yk−1 dTk−1b yk−1
) ,0 − t
gkT sk−1 dTk−1b yk−1
,
where
b yk−1 = yk−1 + ρ
θk−1 sTk−1 uk−1
! uk−1
,
θk−1 = 6(fk−1 − fk ) + 3(gk−1 + gk )T sk−1 , and ρ and t are nonnegative parameters and uk−1 is any vector such that sTk−1 uk−1 6= 0. Later, based on the MBFGS secant condition given in [26], Zhou and Zhang [22] proposed another formula for βk :
βkZZ =
gkT (b zk−1 − tsk−1 ) dTk−1b zk−1
,
where
b zk−1 = yk−1 + hkgk−1 km sk−1 ,
378
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
and m and h are positive constants. In addition, by using the multi-step secant condition of Ford and Moghrabi [27,28], Ford, Narushima and Yabe [23] derived two new formulas for βk :
( β
MS1+ k
= max
dTk−1
( β
MS2+ k
= max
gkT wk−1
wk−1
gkT w ˜ k−1 dTk−1
w ˜ k−1
) ,0 − t ) ,0 − t
gkT qk−1 T dk−1 k−1
,
gkT qk−1 T dk−1 k−1
,
w w ˜
where qk−1 = sk−1 − ψk−1 sk−2 ,
wk−1 = yk−1 − ψk−1 yk−2 , w ˜ k−1 = yk−1 − t ψk−1 yk−2 , δk2−1 ψk−1 = , 1 + 2δk−1 ksk−1 k δk−1 = η , ksk−2 k and η ≥ 0 and 0 ≤ t ≤ 1 are scalar parameters. We should note that the global convergence properties of the methods described above were shown under the assumption that descent search directions were obtained. 3. New formulas for βk based on structured secant conditions for nonlinear least squares problems In this section, following the Dai–Liao method, we propose conjugate gradient methods that make use of the structure of the Hessian of the objective function for nonlinear least squares problems. 3.1. Structured secant conditions for nonlinear least squares problems In this subsection, we briefly review structured secant conditions for nonlinear least squares problems. The gradient vector and the Hessian matrix of the objective function (1.1) have special forms, which are given by
∇ f (x) = J (x)T r (x)
(3.1)
and
∇ 2 f (x) = J (x)T J (x) +
p X
ri (x)∇ 2 ri (x),
(3.2)
i=1
respectively, where J (x) denotes the p × n Jacobian matrix of r (x) whose ith row is ∇ ri (x)T . Since the complete Hessian matrix (3.2) is often expensive to compute, methods have been developed that use only the first derivative information. For example, the Gauss–Newton method and the Levenberg–Marquardt method approximate ∇ 2 f (xk ) as follows, T ∇ 2 f (xk ) ≈ BGN k = J (xk ) J (xk ),
(3.3)
T ∇ 2 f (xk ) ≈ BLM k = J (xk ) J (xk ) + νk I ,
(3.4)
and
where νk is a positive parameter such that 0 < ν ≤ νk ≤ ν , and ν and ν are positive constants. Since these methods neglect the second part of the Hessian matrix (3.2), they can be expected to perform well when the residual f (x∗ ) is small or the functions ri are close to linear (these cases are called the small residual problems). However, when the residual f (x∗ ) is very large and the functions ri are rather nonlinear, these methods may perform poorly (these cases are called the large residual problems). In order to overcome such a poor performance, the structured quasi-Newton methods approximate the second part of (3.2) at xk by Ak , and the approximation to the whole Hessian is given by
∇ 2 f (xk ) ≈ BSQN = J (xk )T J (xk ) + Ak . k Usually, Ak is obtained by using the updating formula of the form Ak = Ak−1 + ∆Ak−1 .
(3.5)
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
379
We introduce a condition that is imposed to update Ak . By the Taylor expansion of ∇ ri (xk−1 ), we have
∇ ri (xk−1 ) ≈ ∇ ri (xk ) + ∇ 2 r (xk )(xk−1 − xk ), which implies
∇ 2 ri (xk )(xk − xk−1 ) ≈ ∇ ri (xk ) − ∇ ri (xk−1 ). By multiplying both sides by ri (xk ) and summing both sides from i equals 1 to p, we have p X
ri (xk )∇ 2 ri (xk )(xk − xk−1 ) ≈
i =1
p X
ri (xk )(∇ ri (xk ) − ∇ ri (xk−1 ))
i =1
= (J (xk ) − J (xk−1 ))T r (xk ). Thus the matrix Ak−1 is updated so that the new matrix Ak satisfies the following secant condition Ak sk−1 = (J (xk ) − J (xk−1 ))T r (xk ), SQN
which leads to the structured secant condition for Bk SQN
Bk
in (3.5):
sk−1 = J (xk )T J (xk )sk−1 + Ak sk−1
= J (xk )T J (xk )sk−1 + (J (xk ) − J (xk−1 ))T r (xk ).
(3.6)
method possesses the quadratic convergence property in the case of zero residual problems, because PpThe Gauss–Newton ∗ 2 ∗ r ( x )∇ r ( x ) = 0 holds. However structured quasi-Newton methods do not perform as well as the Gauss–Newton i i i =1 method does. This is caused by the fact that a matrix Ak generated by quasi-Newton updates does not approach the zero matrix. In order to overcome this deficiency, as another structured quasi-Newton method, Huschens [6] proposed to approximate ∇ 2 f (x) so that
∇ 2 f (xk ) ≈ BHk = J (xk )T J (xk ) + kr (xk )kCk , where the matrix Ck is the kth approximation to
(3.7) ri (xk ) i=1 kr (xk−1 )k
Pp
∇ 2 ri (xk ). The matrix Ck is also obtained by using the updating
formula of the form Ck = Ck−1 + ∆Ck−1 . In the same way as the structured quasi-Newton method above, the matrix Ck−1 is updated so that the new matrix Ck satisfies the following secant condition Ck sk−1 = (J (xk ) − J (xk−1 ))T
r (xk )
kr (xk−1 )k
,
which implies the structured secant condition for BH k in (3.7): T BH k sk−1 = J (xk ) J (xk )sk−1 + kr (xk )kCk sk−1
= J (xk )T J (xk )sk−1 +
kr (xk )k (J (xk ) − J (xk−1 ))T r (xk ). kr (xk−1 )k
(3.8)
Next we introduce the structured secant conditions of the form Bk sk−1 = zk−1 .
(3.9)
By substituting (3.3) and (3.4) into (3.9), the vector zk−1 that corresponds to the Gauss–Newton method and the Levenberg–Marquardt method is defined by GN T zkGN −1 = Bk sk−1 = J (xk ) J (xk )sk−1
(3.10)
LM T zkLM −1 = Bk sk−1 = (J (xk ) J (xk ) + νk I )sk−1 ,
(3.11)
and
respectively. On the other hand, by (3.6), zk−1 can be written as zk−1 = J (xk )T J (xk )sk−1 + (J (xk ) − J (xk−1 ))T r (xk ).
(3.12)
By incorporating a nonnegative parameter ρk into (3.12), we have zk−1 = J (xk )T J (xk )sk−1 + ρk (J (xk ) − J (xk−1 ))T r (xk ). SQN
(3.13)
380
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
In addition, by (3.8), zk−1 can be written as zk−1 = J (xk )T J (xk )sk−1 +
kr (xk )k (J (xk ) − J (xk−1 ))T r (xk ). kr (xk−1 )k
(3.14)
Similarly, by incorporating a nonnegative parameter ρk into (3.14), we have zkH−1 = J (xk )T J (xk )sk−1 + ρk
kr (xk )k (J (xk ) − J (xk−1 ))T r (xk ). kr (xk−1 )k
(3.15)
In the case ρk = 0, (3.13) and (3.15) reduce to (3.10). In the case ρk = 1, (3.13) and (3.15) become (3.12) and (3.14), respectively. 3.2. New formulas for βk In this subsection, based on the arguments in Section 2, we apply the structured secant conditions given in the previous subsection to conjugate gradient methods, and we propose new formulas for βk . As calculated in (2.4), Eqs. (2.3) and (3.9) yield dTk zk−1 = dTk Bk sk−1 = (Bk dk )T sk−1 = −gkT sk−1 . By taking this relation into consideration, we introduce a nonnegative parameter tk to obtain dTk zk−1 = −tk gkT sk−1 .
(3.16)
Substituting (1.3) into (3.16) gives
(−gk + βk dk−1 )T zk−1 = −tk gkT sk−1 , and then we obtain a new βk in the form
βknew =
gkT (zk−1 − tk sk−1 ) dTk−1 zk−1
.
Now we denote βknew with (3.10), (3.11), (3.13), or (3.15) by βkGN , βkLM , βk
SQN
gkT (zkGN −1 − tk sk−1 )
βkGN =
dTk−1 zkGN −1 gkT (zkLM −1 − tk sk−1 )
βkLM =
dTk−1 zkLM −1
, or βkH , respectively, that is,
,
(3.17)
,
(3.18)
gkT (zk−1 − tk sk−1 ) SQN
βkSQN = βkH =
SQN
dTk−1 zk−1
gkT (zkH−1 − tk sk−1 ) dTk−1 zkH−1
,
(3.19)
.
(3.20)
SQN
LM H Remark 1. Though the vectors zkGN −1 , zk−1 , zk−1 , and zk−1 contain the Jacobian matrix, we only use matrix–vector products. Thus we can use the sparse structure of the Jacobian matrix in practical computations.
Remark 2. In case of linear least squares problems, the objective function can be represented by the quadratic function: f (x) =
1 2
kAx − bk2 ,
A ∈ Rp×n , x ∈ Rn , b ∈ Rp .
Since we have J (x) = A and yk−1 = gk − gk−1
= (AT Axk − AT b) − (AT Axk−1 − AT b) = AT Ask−1 , it follows that zk−1 = AT Ask−1 = yk−1 except for (3.11). If the step size is calculated by the exact line search, we obtain SQN gkT sk−1 = 0. In this case, βkGN , βk , and βkH reduce to βkHS . Therefore our methods become the linear conjugate gradient method for linear least squares problems.
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
381
4. Convergence analysis In this section, we show global convergence properties of the conjugate gradient methods with (3.17)–(3.20) under some assumptions. Throughout this section, we assume that gk 6= 0
for all k,
otherwise a stationary point is found. We make the following standard assumptions on the objective function. Assumption A1. The level set L = {x|f (x) ≤ f (x0 )} at x0 is bounded, namely, there exists a positive constant aˆ such that
kxk ≤ aˆ for all x ∈ L.
(4.1)
Assumption A2. In some neighborhood N of L, r (x) is continuously differentiable, and the Jacobian matrix J (x) is Lipschitz continuous, namely, there exists a positive constant L such that
kJ (x) − J (¯x)k ≤ Lkx − x¯ k for all x, x¯ ∈ N .
(4.2)
Now we give some estimates for r (x), J (x) and ∇ f (x), which will be used in proving the subsequent lemmas and theorems, under Assumptions A1 and A2. Since using (4.1) and (4.2) yields
kJ (x)k ≤ kJ (x) − J (x0 )k + kJ (x0 )k ≤ Lkx − x0 k + kJ (x0 )k ≤ 2Laˆ + kJ (x0 )k for all x ∈ L, there exists a positive constant γ such that kJ (x)k ≤ γ
for all x ∈ L.
(4.3)
In addition, by using (4.2) and the mean value theorem, we have for any x and x¯ ∈ L
kr (x) − r (¯x) − J (¯x)(x − x¯ )k ≤
1
Z
kJ (¯x + τ (x − x¯ )) − J (¯x)kkx − x¯ kdτ 0
≤ Lkx − x¯ k2
1
Z
τ dτ 0
=
L 2
kx − x¯ k2 .
Then this implies that
kr (x) − r (¯x)k ≤
L 2
≤
kx − x¯ k2 + kJ (¯x)(x − x¯ )k L kx − x¯ k + kJ (¯x)k kx − x¯ k 2
≤ (Laˆ + γ )kx − x¯ k. It follows from the above inequality that there exists a positive constant L0 such that
kr (x) − r (¯x)k ≤ L0 kx − x¯ k for all x, x¯ ∈ L. Similarly to (4.3), there exists a positive constant γ 0 such that
(4.4)
kr (x)k ≤ γ 0 for all x ∈ L.
(4.5)
By using (4.2)–(4.5), we have
k∇ f (x) − ∇ f (¯x)k = ≤ ≤ ≤
kJ (x)T (r (x) − r (¯x)) + (J (x) − J (¯x))T r (¯x)k kJ (x)kkr (x) − r (¯x)k + kJ (x) − J (¯x)kkr (¯x)k L0 kJ (x)kkx − x¯ k + Lkx − x¯ kkr (¯x)k (L0 γ + Lγ 0 )kx − x¯ k. Thus there exists a positive constant L¯ such that k∇ f (x) − ∇ f (¯x)k ≤ L¯ kx − x¯ k for all x, x¯ ∈ L, which implies that there exists a positive constant γ¯ such that
k∇ f (x)k ≤ γ¯
for all x ∈ L.
(4.6)
In order to prove the global convergence of the methods with β assumption.
GN k ,
β
SQN , k
and β
H k ,
we make the following additional
382
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Assumption A3. In the level set L, the rank of the Jacobian matrix J (x) is n. We note that Assumption A3 guarantees that there exists a positive constant λ such that
k(J (x)T J (x))−1 k ≤ λ for all x ∈ L.
(4.7)
Furthermore, we make assumptions on the search direction and the line search conditions. We suppose that each search direction dk satisfies the descent condition gkT dk < 0.
(4.8)
If necessary, we make a stronger assumption that each search direction satisfies the sufficient descent condition gkT dk ≤ −¯c kgk k2
(4.9)
for some positive constant c¯ . The step size αk in (1.2) is obtained by performing a line search, in which the following Wolfe conditions are satisfied f (xk + αk dk ) − f (xk ) ≤ δαk gkT dk ,
(4.10)
∇ f (xk + αk dk ) dk ≥ σ
(4.11)
T
gkT dk
,
where δ and σ are constants such that 0 < δ < σ < 1. Inequalities (4.10) and (4.11) are called the sufficient decrease condition and the curvature condition, respectively. We also use the strong Wolfe conditions, which consist of (4.10) and the following inequality
|g (xk + αk dk )T dk | ≤ −σ gkT dk .
(4.12)
The following two theorems will play important roles in proving global convergence properties in Sections 4.1 and 4.2. These theorems can be shown only by using descent condition (4.8) and sufficient decrease condition (4.10). We emphasize that neither condition (4.9) nor condition (4.11) is used. In what follows, we denote Jk = J (xk ) and rk = r (xk ) for simplicity. Theorem 1. Suppose that Assumptions A1–A3 hold. Consider the conjugate gradient methods in the form (1.2)–(1.3) with βk and βkH . Assume that tk satisfies 0 ≤ tk ≤ t¯ for a constant t¯, and that dk and αk satisfy the descent condition (4.8) and the 1 sufficient decrease condition (4.10), respectively. If ρk satisfies 0 ≤ ρk ≤ ρ¯ < Lγ 0 (λγ for some positive constant ρ¯ , then we )2 SQN
have that
kdk k < c for all k for some positive constant c. Proof. First, we consider the conjugate gradient method with βk . By the descent condition (4.8) and the sufficient decrease condition (4.10), we note that the sequence {xk } is contained in the level set L. It follows from (3.13), (4.2), (4.3) and (4.5) that SQN
T T T T T |gkT zkSQN −1 − tk gk sk−1 | = |gk {Jk Jk sk−1 + ρk (Jk − Jk−1 ) rk } − tk gk sk−1 |
≤ kgk kkJkT Jk kksk−1 k + ρk kgk kkJk − Jk−1 kkrk k + tk kgk kksk−1 k ≤ γ 2 kgk kksk−1 k + ρk Lγ 0 kgk kksk−1 k + tk kgk kksk−1 k = (γ 2 + ρ¯ Lγ 0 + t¯)kgk kksk−1 k. Since Assumption A3 guarantees the nonsingularity of
)
JkT Jk −1 JkT Jk dk−1
kdk−1 k = k(
JkT Jk ,
(4.13) we have
k
≤ k(JkT Jk )−1 JkT kkJk dk−1 k.
(4.14)
Therefore, (3.13), (4.2), (4.3), (4.5), (4.7) and (4.14) yield T T T |dTk−1 zkSQN −1 | = |dk−1 {Jk Jk sk−1 + ρk (Jk − Jk−1 ) rk }|
≥ αk−1 kJk dk−1 k2 − ρk kdk−1 kkJk − Jk−1 kkrk k 2 1 ≥ αk−1 k d k − ρk Lkdk−1 kksk−1 kkrk k k−1 k(JkT Jk )−1 JkT k 2 1 ≥ αk−1 kdk−1 k − ρk Lαk−1 kdk−1 k2 krk k k(JkT Jk )−1 kkJk k 1 0 ≥ − ρ ¯ L γ αk−1 kdk−1 k2 . (λγ )2
(4.15)
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
383
1 We note that the condition 0 ≤ ρ¯ < Lγ 0 (λγ implies (λγ1 )2 − ρ¯ Lγ 0 > 0. By (1.3), (3.19), (4.13), (4.15) and (4.6), we obtain )2
kdk k = k − gk + βkSQN dk−1 k ≤ kgk k + |βkSQN |kdk−1 k = kgk k + ≤ kgk k +
T |gkT zkSQN −1 − tk gk sk−1 |
kdk−1 k |dTk−1 zkSQN −1 | (γ 2 + ρ¯ Lγ 0 + t¯)kgk kksk−1 k
( (λγ1 )2 − ρ¯ Lγ 0 )αk−1 kdk−1 k2 ( ) γ 2 + ρ¯ Lγ 0 + t¯ ≤ 1+ γ¯ . 1 − ρ¯ Lγ 0 (λγ )2
kdk−1 k
We complete the proof of the case βk = βk . kr k Since it follows from the sufficient decrease condition (4.10) that kr k k ≤ 1 holds, the proof of the case βk = βkH is the SQN
k−1
same as that of the case βk = βk
SQN
If ρk = 0, then β
SQN k
. Thus we omit the proof. Therefore the theorem is proved.
and βkH reduce to βkGN . Thus we obtain the following corollary.
Corollary 2. Suppose that Assumptions A1–A3 hold. Consider the conjugate gradient method in the form (1.2)–(1.3) with βkGN . Assume that tk satisfies 0 ≤ tk ≤ t¯ for a constant t¯, and that dk and αk satisfy the descent condition (4.8) and the sufficient decrease condition (4.10), respectively. Then we have that
kdk k < c for all k for some positive constant c. Next, we deal with the case βk = βkLM . In the following theorem, we do not need Assumption A3. Theorem 3. Suppose that Assumptions A1 and A2 hold. Consider the conjugate gradient method in the form (1.2)–(1.3) with βkLM . Assume that tk satisfies 0 ≤ tk ≤ t¯ for a constant t¯, and that dk and αk satisfy the descent condition (4.8) and the sufficient decrease condition (4.10), respectively. If νk satisfies ν ≤ νk ≤ ν for positive constants ν and ν , then we have that
kdk k < c for all k for some positive constant c. Proof. By the descent condition (4.8) and the sufficient decrease condition (4.10), we note that the sequence {xk } is contained in the level set L. It follows from (3.11) and (4.3) that T T T T |gkT zkLM −1 − tk gk sk−1 | = |gk (Jk Jk + νk I )sk−1 − tk gk sk−1 |
≤ kgk kkJkT Jk kksk−1 k + νk kgk kksk−1 k + tk kgk kksk−1 k ≤ (γ 2 + ν + t¯)kgk kksk−1 k.
(4.16)
It also follows from (3.11) that T T |dTk−1 zkLM −1 | = |dk−1 (Jk Jk + νk I )sk−1 |
= αk−1 kJk dk−1 k2 + νk αk−1 kdk−1 k2 ≥ ναk−1 kdk−1 k2 .
(4.17)
By (1.3), (3.18), (4.16), (4.17) and (4.6), we have that
kdk k = k − gk + βkLM dk−1 k |g T z LM − tk g T sk−1 | kdk−1 k ≤ kgk k + k k−1T LM k |dk−1 zk−1 | (γ 2 + ν + t¯)kgk kksk−1 k ≤ kgk k + kdk−1 k ναk−1 kdk−1 k2 (γ 2 + ν + t¯)kgk k = kgk k + ν γ 2 + ν + t¯ ≤ 1+ γ¯ . ν Therefore, the proof is complete.
384
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Table 1 Test problems. Function name
n/p
No. of table
f (x ∗ )
Extended Rosenbrock function Extended Powell Singular function Broyden Tridiagonal function Trigonometric function Wood function Kowalik and Osborne function Bard function Brown and Dennis function Penalty I function Freudenstein and Roth function
100 000/100 000 100 000/100 000 100 000/100 000 1000/1000 4/6 4/11 3/15 4/10 100 000/100 001 2/2
Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12
0 0 0 0 0 1.5375 · · · × 10−4 4.1074 · · · × 10−3 0.7216 · · · 0.4984 · · · 24.492 · · ·
Table 2 The number of non-zero elements of the Jacobian matrix. Ex. Rosenbrock
Ex. Powell
Penalty 1
Broyden tridiagonal
Trigonometric
3n/2
2n
2n
3n − 2
n2
In the following subsections, we show global convergence properties of the proposed methods. In Section 4.1, we prove the global convergence under the sufficient descent condition (4.9) and the Wolfe conditions (4.10) and (4.11). In Section 4.2, we prove the global convergence under the descent condition (4.8) and the strong Wolfe conditions (4.10) and (4.12). 4.1. Global convergence under the sufficient descent condition and the Wolfe conditions The following lemma is an immediate result from the Zoutendijk condition [13]. Lemma 4. Suppose that Assumptions A1 and A2 hold. Consider any iterative method in the form (1.2). Assume that dk and αk satisfy the sufficient descent condition (4.9) and the Wolfe conditions (4.10) and (4.11), respectively. If
X
1
k≥0
kdk k2
= ∞,
(4.18)
then the following holds lim inf kgk k = 0.
(4.19)
k→∞
Proof. By assumptions of this lemma, dk and αk satisfy the descent condition (4.8) and the Wolfe conditions (4.10) and (4.11), respectively. Thus, the following Zoutendijk condition holds
X (g T dk )2 k
k≥0
kd k k2
<∞
(4.20)
(see Theorem 3.2 of [13]). If (4.19) is not true, there exists a positive constant ε such that
kgk k > ε for all k.
(4.21)
It follows from (4.9) and (4.21) that gkT dk ≤ −¯c kgk k2
< −¯c ε2 . By squaring both sides and dividing both sides by kdk k2 of the above inequality, we have
(gkT dk )2 c¯ 2 ε 4 > , kd k k2 kdk k2 which implies by (4.18)
X (g T dk )2 k
k≥0
kd k k2
= ∞.
This contradicts the Zoutendijk condition (4.20). Therefore the proof is complete.
By using Theorems 1 and 3, Corollary 2 and Lemma 4, we obtain the following convergence theorem.
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
385
Table 3 Extended Rosenbrock function (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
23/156 0.842
23/150 0.826
23/161 0.811
63/200 1.185
84/228 1.451
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
22/154 0.78
22/154 0.686
22/154 0.686
22/154 0.671
22/155 0.609
22/155 0.64
22/155 0.671
(c) Numerical results for CGLM
ν 1 0.1 0.01 0.001
t 0
0.1
0.3
0.5
0.7
0.9
1
50/193 3.915 27/136 1.919 24/127 1.825 24/129 1.762
50/193 3.432 28/136 2.028 24/126 1.763 25/131 1.779
50/194 3.478 26/130 1.825 25/131 1.794 26/131 1.904
50/195 3.37 24/134 1.732 25/129 1.794 24/130 1.747
50/195 3.401 24/132 1.762 24/127 1.731 25/129 1.809
50/195 3.401 22/122 1.591 26/128 1.872 24/124 1.716
50/195 3.416 28/135 1.95 26/128 1.857 25/126 1.825
(d) Numerical results for CGSQN
ρ 0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
t 0
0.1
0.3
0.5
0.7
0.9
1
25/124 3.261 21/127 2.496 18/121 2.137 16/119 1.918 16/115 1.966 15/116 1.841 15/120 1.81
24/122 2.73 21/127 2.59 18/121 2.168 16/120 1.966 16/115 1.935 15/116 1.887 15/121 1.934
25/126 2.901 21/133 2.496 18/121 2.184 16/119 1.966 16/115 1.965 16/115 1.935 15/122 1.825
24/122 2.808 21/133 2.511 18/121 2.184 18/126 2.184 15/110 1.81 16/115 1.965 15/122 1.919
24/123 2.746 21/133 2.45 18/121 2.215 15/115 1.84 15/110 1.856 16/115 1.918 15/122 1.841
24/124 2.761 21/133 2.558 18/121 2.122 15/115 1.84 16/112 1.903 16/115 1.888 15/122 1.919
24/124 2.761 21/134 2.465 18/121 2.106 15/115 1.81 16/112 1.95 16/115 2.012 15/122 1.841
0
0.1
0.3
0.5
0.7
0.9
1
25/124 3.245 20/118 2.356 18/125 2.199 18/122 2.215 17/128 2.059 15/117 1.935 16/124 2.028
24/122 2.855 20/118 2.371 18/125 2.121 18/122 2.153 15/121 1.934 15/117 1.841 16/124 1.95
25/126 2.886 22/119 2.543 18/125 2.2 18/122 2.106 15/121 1.825 19/120 2.231 16/124 1.997
24/122 2.824 21/118 2.433 18/125 2.153 18/122 2.137 17/128 2.106 17/118 1.997 16/124 1.997
24/123 2.839 19/114 2.262 17/112 2.06 18/122 2.152 14/116 1.701 17/119 2.09 16/124 1.997
24/124 2.808 21/124 2.527 18/115 2.106 18/122 2.137 15/122 1.918 17/119 2.121 16/124 1.95
24/124 2.776 23/128 2.668 16/111 1.981 18/122 2.168 15/121 1.856 17/119 2.06 16/124 1.965
(e) Numerical results for CGH
ρ 0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
t
Theorem 5. Suppose that Assumptions A1 and A2 hold. Consider the conjugate gradient method in the form (1.2)–(1.3). Assume that tk satisfies 0 ≤ tk ≤ t¯ for a constant t¯, and that dk and αk satisfy the sufficient descent condition (4.9) and the Wolfe conditions (4.10) and (4.11), respectively. 1 (i) If Assumption A3 holds and ρk satisfies 0 ≤ ρk ≤ ρ¯ < Lγ 0 (λγ for some positive constant ρ¯ , then the conjugate gradient )2 SQN H methods with βk and βk converge globally in the sense that lim infk→∞ kgk k = 0.
386
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Table 4 Extended Powell Singular function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
862/2046 124.519
86/271 15.491
77/272 15.397
Failed
862/1503 92.336
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
155/448 26.598
329/852 51.589
53/211 11.762
316/796 48.157
169/462 27.456
247/635 38.298
331/840 51.153
(c) Numerical results for CGLM
ν
t
1 0.1 0.01 0.001
0
0.1
0.3
0.5
0.7
0.9
1
Failed Failed Failed Failed
Failed Failed Failed Failed
Failed Failed Failed Failed
Failed Failed Failed Failed
Failed Failed Failed 573/1266
Failed Failed Failed Failed
Failed Failed Failed Failed
(d) Numerical results for CGSQN
ρ
t 0
0.1
0.3
0.5
0.7
0.9
1
0 (CGGN)
97/303 29.422 166/458 47.471 109/313 31.106 72/219 20.826 143/446 42.884 62/238 20.639 76/244 22.449
211/570 59.857 183/486 50.918 214/613 62.01 78/246 23.259 125/394 37.456 247/656 67.86 165/482 47.502
149/398 41.808 142/379 39.515 187/556 55.068 123/350 34.788 158/438 44.226 102/311 29.858 118/356 34.491
113/346 34.288 156/460 46.082 117/354 34.694 93/312 28.953 57/202 17.675 134/385 37.845 241/625 65.224
48/191 16.474 150/436 43.524 155/426 43.555 202/554 56.8 109/321 31.2 744/1825 197.808 522/1247 137.155
202/557 57.751 104/320 30.982 49/186 16.052 248/640 68 87/294 27.003 63/223 19.734 996/2339 258.321
110/338 33.103 70/245 22.418 73/231 21.684 60/216 19.344 297/742 78.686 74/269 23.79 101/303 29.203
0.1 0.3 0.5 0.7 0.9 1
(e) Numerical results for CGH
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
0
0.1
0.3
0.5
0.7
0.9
1
97/303 29.593 56/201 17.862 148/417 42.12 46/192 16.021 164/471 46.831 94/270 26.395 78/237 22.542
211/570 60.028 105/342 32.651 123/370 36.208 282/723 76.315 52/179 15.975 142/429 41.621 160/457 45.661
149/398 41.823 78/262 24.164 126/385 37.316 97/307 29.234 85/298 27.004 123/367 35.677 390/998 105.674
113/346 33.836 91/298 27.986 138/411 40.576 200/550 55.989 88/272 25.694 105/317 30.56 304/776 82.244
48/191 16.583 108/327 32.058 77/265 24.367 204/531 55.427 144/395 39.951 72/225 21.169 262/702 72.789
202/557 57.689 152/425 43.103 212/623 62.571 215/563 58.703 119/343 33.867 116/332 32.792 169/450 46.316
110/338 33.103 Failed 208/567 58.594 82/261 24.336 105/345 32.433 240/648 66.908 68/230 20.998
(ii) If Assumption A3 holds, then the conjugate gradient method with βkGN converges globally in the sense that lim infk→∞ kgk k = 0. (iii) If νk satisfies ν ≤ νk ≤ ν for positive constants ν and ν , then the conjugate gradient method with βkLM converges globally in the sense that lim infk→∞ kgk k = 0. Proof.
(i) By Theorem 1, we have
kdk k < c for all k
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
387
Table 5 Broyden Tridiagonal function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
31/121 1.217
29/119 1.155
29/119 1.154
448/621 6.895
44/121 1.311
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
31/121 1.186
31/121 1.264
31/121 1.342
31/121 1.326
31/121 1.31
31/121 1.201
31/121 1.342
(c) Numerical results for CGLM
ν
t
1 0.1 0.01 0.001
0
0.1
0.3
0.5
0.7
0.9
1
30/112 4.555 30/114 4.322 30/114 4.259 30/114 4.259
30/112 4.244 30/114 4.446 30/114 4.274 30/114 4.228
30/112 4.259 30/120 4.384 30/120 4.321 30/114 4.244
31/116 4.524 30/120 4.29 30/120 4.275 30/120 4.274
31/115 4.493 30/120 4.306 30/120 4.337 30/120 4.212
31/115 4.353 31/121 4.539 31/121 4.446 31/121 4.43
31/115 4.446 31/121 4.508 31/121 4.415 31/121 4.493
(d) Numerical results for CGSQN
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
0
0.1
0.3
0.5
0.7
0.9
1
30/114 7.316 30/117 7.176 30/118 7.27 30/118 7.27 30/117 6.786 30/117 7.02 30/117 6.786
30/114 7.348 30/117 7.238 30/118 7.394 30/118 7.378 30/117 6.833 30/117 6.646 30/117 6.739
30/114 7.332 30/117 7.254 30/118 7.192 30/118 7.332 30/117 6.708 30/117 6.552 30/117 6.692
30/120 7.348 30/114 7.191 30/117 7.473 30/118 6.864 30/117 6.771 30/117 6.849 30/117 6.786
30/120 7.192 30/114 7.207 30/117 7.41 30/118 6.864 30/117 6.786 30/117 6.708 30/117 6.584
31/121 7.473 30/120 7.208 30/117 7.301 30/118 6.755 30/117 6.692 30/117 6.74 30/117 6.677
31/121 7.472 31/121 7.426 30/117 7.145 30/118 7.082 30/117 6.771 30/117 7.036 30/117 6.724
(e) Numerical results for CGH
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
0
0.1
0.3
0.5
0.7
0.9
1
30/114 7.317 30/117 7.207 30/117 7.566 30/118 7.629 30/118 6.817 30/122 6.723 30/122 6.942
30/114 7.317 30/117 7.488 30/117 7.582 30/118 7.363 30/118 6.958 30/118 6.848 30/122 6.973
30/114 7.457 30/114 7.145 30/117 7.379 30/118 7.363 30/118 6.942 30/118 6.708 30/122 6.755
30/120 7.378 30/114 7.269 30/117 7.254 30/117 7.254 30/118 6.958 30/118 6.848 30/118 6.833
30/120 7.332 30/120 7.27 30/114 7.394 30/117 6.786 30/118 6.942 30/118 7.114 30/118 6.833
31/121 7.737 30/120 7.551 30/114 7.534 30/117 6.927 30/118 6.801 30/118 6.724 30/118 6.833
31/121 7.675 30/120 7.395 30/121 7.535 30/117 6.802 30/117 6.646 30/118 6.957 30/118 6.63
for some positive constant c. Therefore, we obtain immediately
X
1
k≥0
kd k k2
= ∞.
It follows from Lemma 4 that lim inf kgk k = 0. k→∞
388
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Table 6 Trigonometric function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
320/323 0.187
233/235 0.141
228/230 0.14
Failed
213/418 0.156
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
228/230 0.156
78/80 0.031
69/71 0.031
69/72 0.031
67/70 0.031
68/71 0.031
72/75 0.047
(c) Numerical results for CGLM
ν
t
1 0.1 0.01 0.001
0
0.1
0.3
0.5
0.7
0.9
1
369/371 165.391 379/381 176.81 382/384 179.213 388/390 182.691
326/328 155.859 305/307 141.882 178/180 82.836 164/166 75.551
321/323 151.507 137/139 65.036 91/96 43.493 86/91 40.669
314/316 146.671 95/99 44.522 77/82 35.708 75/79 34.055
307/309 143.722 80/84 37.347 71/75 33.898 72/76 33.759
272/274 127.125 75/79 35.209 79/87 36.785 80/89 37.19
244/246 114.894 77/81 35.817 82/90 38.75 73/78 33.898
(d) Numerical results for CGSQN
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
0
0.1
0.3
0.5
0.7
0.9
1
388/390 360.765 394/396 359.533 402/406 369.002 406/411 370.407 412/417 377.177 418/423 380.983 420/447 385.429
162/164 147.404 159/161 144.83 153/155 139.324 145/147 131.524 138/140 126.15 131/133 120.291 127/129 116.127
86/91 78.188 86/91 78.234 86/90 79.716 82/85 75.082 82/85 75.504 79/81 72.525 77/79 69.888
75/79 68.765 75/79 68.297 74/78 67.345 72/76 65.676 75/79 68.889 78/84 71.962 73/77 67.236
83/89 76.175 71/75 64.755 80/89 73.367 71/75 64.99 71/75 65.66 71/75 65.91 71/75 64.694
72/77 65.395 72/78 66.612 72/78 66.502 87/105 79.934 72/78 65.723 90/124 82.196 89/124 82.415
72/77 66.8 72/77 65.598 81/91 74.568 85/103 78.734 96/131 87.999 90/119 81.9 132/291 120.666
(e) Numerical results for CGH
ρ
t 0
0.1
0.3
0.5
0.7
0.9
1
0 (CGGN)
388/390 357.38 394/396 356.741 404/408 370.095 410/415 375.57 416/421 376.069 411/416 371.92 422/455 383.073
162/164 147.701 159/161 144.752 153/155 139.9 146/148 132.959 138/140 126.407 131/133 120.572 128/130 116.829
86/91 78.702 85/90 78.452 86/91 78.343 84/87 76.347 81/84 75.332 84/86 76.081 81/83 74.631
75/79 70.075 75/79 69.467 74/78 67.314 72/76 65.13 70/74 64.85 75/79 68.438 78/82 71.807
83/89 75.753 72/76 65.739 74/78 67.189 70/74 63.742 71/75 65.956 71/75 65.114 71/75 64.927
72/77 65.972 74/81 68.094 81/93 74.397 97/113 87.734 93/122 85.347 84/112 76.221 87/100 80.355
72/77 66.955 72/77 66.534 84/96 76.315 81/90 74.115 76/89 70.247 96/115 86.658 94/132 86.767
0.1 0.3 0.5 0.7 0.9 1
(ii) We obtain the results immediately from (i) by setting ρk = 0. (iii) Since we can prove it in the same way as (i), we omit the proof.
4.2. Global convergence under the descent condition and the strong Wolfe conditions The following useful lemma was proved by Dai et al. (see corollary 2.4 in [29]).
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
389
Table 7 Wood function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
83/234
108/268
108/268
832/1171
Failed
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
83/234
102/274
83/234
70/211
142/350
120/305
120/307
(c) Numerical results for CGLM
ν
t
1 0.1 0.01 0.001
0
0.1
0.3
0.5
0.7
0.9
1
203/406 91/231 74/202 73/204
197/391 98/249 74/205 75/203
207/407 116/260 66/189 80/205
201/402 76/211 89/228 79/212
197/389 80/219 65/188 66/191
213/414 93/241 70/197 57/175
203/405 88/234 55/169 50/159
(d) Numerical results for CGSQN
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
0
0.1
0.3
0.5
0.7
0.9
1
76/209 80/208 64/198 65/191 70/205 69/204 80/217
79/213 86/228 70/207 47/159 62/192 71/201 69/204
81/208 102/257 70/206 45/155 74/215 65/193 71/203
78/207 91/232 69/204 48/161 58/181 61/190 83/225
74/203 101/245 70/207 48/160 69/204 54/177 68/203
102/242 76/205 69/205 47/160 70/209 52/174 91/243
61/180 79/211 70/204 46/158 46/165 59/186 86/235
0
0.1
0.3
0.5
0.7
0.9
1
76/209 53/174 63/191 71/200 45/158 71/207 30/135
79/213 63/195 47/158 73/209 95/234 49/171 28/129
81/208 50/170 52/169 62/188 91/238 21/122 36/148
78/207 71/207 53/172 62/190 76/209 39/155 48/169
74/203 66/198 45/158 57/181 69/197 89/243 37/148
102/242 64/194 78/215 62/191 74/205 117/309 57/184
61/180 54/176 47/162 35/147 70/201 112/299 92/240
(e) Numerical results for CGH
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
Lemma 6. Suppose that Assumptions A1 and A2 hold. Consider any conjugate gradient method in the form (1.2)–(1.3), where dk and αk satisfy the descent condition (4.8) and the strong Wolfe conditions (4.10) and (4.12), respectively. If
X
1
k≥0
kdk k2
= ∞,
then the following holds lim inf kgk k = 0. k→∞
By using Theorems 1 and 3, Corollary 2 and Lemma 6, we obtain the following convergence theorem. The proofs can be done in the same way as those of Theorem 5. Theorem 7. Suppose that Assumptions A1 and A2 hold. Consider the conjugate gradient method in the form (1.2)–(1.3). Assume that tk satisfies 0 ≤ tk ≤ t¯ for a constant t¯, and that dk and αk satisfy the descent condition (4.8) and the strong Wolfe conditions (4.10) and (4.12), respectively. 1 (i) If Assumption A3 holds and ρk satisfies 0 ≤ ρk ≤ ρ¯ < Lγ 0 (λγ for some positive constant ρ¯ , then the conjugate gradient )2 SQN H methods with βk and βk converge globally in the sense that lim infk→∞ kgk k = 0. (ii) If Assumption A3 holds, then the conjugate gradient method with βkGN converges globally in the sense that lim infk→∞ kgk k = 0. (iii) If νk satisfies ν ≤ νk ≤ ν for positive constants ν and ν , then the conjugate gradient method with βkLM converges globally in the sense that lim infk→∞ kgk k = 0.
390
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Table 8 Kowalik and Osborne function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
47/229
147/466
31/230
639/1022
716/1073
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
76/333
27/178
25/173
42/216
66/256
89/315
66/255
(c) Numerical results for CGLM
ν
t 0
0.1
0.3
0.5
0.7
0.9
1
1 0.1 0.01 0.001
Failed 237/706 154/446 334/842
Failed 279/706 124/435 235/689
Failed 221/689 50/267 254/681
Failed 210/670 45/257 61/272
Failed 326/758 56/236 93/296
Failed 392/1136 66/269 264/648
Failed 297/760 63/289 92/390
(d) Numerical results for CGSQN
ρ
t 0
0.1
0.3
0.5
0.7
0.9
1
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
130/416 69/295 88/312 162/460 150/456 331/832 216/575
93/373 51/274 70/297 41/244 203/582 116/396 367/884
109/411 60/277 127/433 59/288 63/295 77/330 93/369
138/468 52/277 29/217 65/288 110/377 317/830 50/265
69/264 67/274 53/230 55/251 71/283 90/311 59/252
90/314 69/291 99/326 23/169 59/250 37/207 121/373
100/430 92/392 60/295 113/404 61/338 120/472 154/494
0
0.1
0.3
0.5
0.7
0.9
1
130/416 52/240 124/403 87/298 35/207 54/264 45/248
93/373 50/256 58/289 56/273 49/262 317/864 59/275
109/411 41/250 50/271 116/406 54/268 173/523 58/281
138/468 86/357 43/249 40/252 54/261 193/592 157/487
69/264 67/265 134/420 51/226 43/221 95/342 119/368
90/314 89/321 74/280 43/208 92/317 73/294 150/410
100/430 56/324 55/297 59/296 47/298 56/313 46/267
(e) Numerical results for CGH
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
5. Numerical results In this section, some numerical results are shown. All routines are written in JAVA 1.6 and run on a DELL Precision 490 with 4 Gb memory under Windows Vista. We tested the following conjugate gradient methods to compare our proposed methods with other conjugate gradient methods. 1. FR
7. CGGN
: : : : : : :
8. CGLM
: Our method with βkLM in (3.18)
2. PR 3. PR+ 4. HS 5. DY 6. DL+
The Fletcher–Reeves method (1.5) The Polak–Ribière method (1.6) The Polak–Ribière Plus method (1.7) The Hestenes–Stiefel method (1.4) The Dai-Yuan method (1.8) The Dai–Liao method (2.8) Our method with βkGN in (3.17)
9. CGSQN : Our method with βk
SQN
10. CGH
in (3.19)
: Our method with βkH in (3.20)
We used test problems from Moré, Garbow and Hillstrom [30]. These test problems are of the form ‘‘min kr (x)k2 ’’. Because of correspondence of those problems with (1.1), we divided those by two and used the test problems of the form ‘‘min 12 kr (x)k2 ’’. Table 1 gives the function name, the numbers n and p, the number of table, and the residual of the objective
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
391
Table 9 Bard function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
18/119
21/125
14/110
75/218
45/171
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
18/114
20/118
28/140
31/144
30/142
29/139
22/130
0
0.1
0.3
0.5
0.7
0.9
1
Failed 137/324 25/130 16/115
Failed 129/275 24/131 17/117
Failed 129/277 27/138 14/111
Failed 127/279 25/131 13/110
Failed 134/345 25/130 12/107
Failed 132/341 25/132 9/99
Failed 132/341 26/133 8/97
(c) Numerical results for CGLM
ν
t
1 0.1 0.01 0.001
(d) Numerical results for CGSQN
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
0
0.1
0.3
0.5
0.7
0.9
1
14/111 21/127 14/112 11/105 32/147 16/117 19/118
14/111 10/103 18/117 15/116 34/147 16/116 30/138
14/111 66/227 18/118 17/117 30/143 18/119 20/124
14/112 18/119 11/103 18/120 32/145 18/119 19/121
9/99 16/115 18/117 12/111 10/101 19/119 22/124
9/99 15/111 19/118 13/116 31/143 21/126 21/123
9/99 13/107 20/120 20/130 32/145 20/121 21/123
0
0.1
0.3
0.5
0.7
0.9
1
14/111 14/111 14/111 13/111 9/100 32/152 16/115
14/111 14/111 14/111 9/100 42/176 38/162 24/133
14/111 14/111 12/107 9/100 46/177 65/225 12/105
14/112 12/107 9/100 9/100 69/234 40/172 34/151
9/99 9/99 9/100 9/100 22/131 38/163 9/99
9/99 9/99 9/100 8/98 12/107 42/168 28/144
9/99 9/99 9/100 9/100 20/124 25/135 13/110
(e) Numerical results for CGH
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
function at the solution. The numerical results for zero residual problems are summarized in Tables 3–7, and the numerical results for non-zero residual problems are summarized in Tables 8–12. For large-scale problems (Ex. Rosenbrock, Ex. Powell, Penalty 1, Broyden Tridiagonal and Trigonometric), we made use of the sparsity of the Jacobian matrix in computing matrix–vector products in βk . Table 2 shows the number of non-zero elements of the Jacobian matrix. Since the calculation of J (xk )T r (xk ) in (3.1) is expensive, we compute gk directly. Accordingly, J (xk ) and r (xk ) are calculated only for βk . The step size αk is computed so that the strong Wolfe conditions with δ = 0.0001 and σ = 0.1 are satisfied. We used Moré’s line search method [31] following Nocedal [32]. At each iteration, the initial step size is chosen as
αk(0) = αk−1
gkT−1 dk−1 gkT dk
.
We used the same line search in all the methods. We restart the algorithm by setting dk = −gk when an ascent direction is generated. Tables 3–12 summarize our numerical results. In each table, (a) means numerical results for HS, PR, PR+, FR, and DY. (b), (c), (d), and (e) give numerical results for DL+, CGLM, CGSQN, and CGH, respectively. In our numerical experiments, we fixed the values of parameters tk and ρk , namely, we set tk = t and ρk = ρ . Note that the case ρ = 0 in (d) and (e) corresponds to the numerical result for CGGN. The parameters ρ and t are chosen as 0, 0.1, 0.3, 0.5, 0.7, 0.9, and 1, and the parameter ν is chosen as 1.0, 0.1, 0.01, and 0.001. Tables 3–12 give the numerical results of the form: (number of iterations/number of function evaluations). In addition, for large-scale problems, we give CPU time (sec) in second row of each result. Since we could not find any difference between CPU time of all the methods for small-scale problems, we omit it. If the number of iterations exceeds 1000, we write ‘‘Failed’’. The stopping criterion is
kgk k ≤ 10−5 . From our numerical experiments, we have the following observations:
392
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Table 10 Brown and Dennis function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
56/191
65/200
45/165
59/181
62/188
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
29/127
41/149
33/139
29/128
26/123
26/122
31/132
(c) Numerical results for CGLM
ρ 1 0.1 0.01 0.001
t 0
0.1
0.3
0.5
0.7
0.9
1
56/186 77/227 70/219 67/210
60/204 62/206 67/217 63/205
52/192 44/171 61/206 64/210
62/202 66/219 56/195 65/212
49/175 58/201 73/234 53/198
77/238 55/201 57/206 86/250
58/198 68/218 70/218 65/212
(d) Numerical results for CGSQN
ρ 0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
t 0
0.1
0.3
0.5
0.7
0.9
1
85/258 71/231 68/223 42/162 49/181 40/165 50/174
62/207 72/228 53/194 50/185 33/151 36/157 54/191
67/219 53/192 46/174 88/257 36/154 26/142 45/174
54/191 58/205 46/175 55/192 57/204 47/182 42/175
60/205 58/207 63/209 41/166 46/180 47/178 38/164
78/241 72/231 34/150 52/192 38/159 47/178 42/173
79/247 56/193 65/219 36/158 53/186 55/192 40/162
0
0.1
0.3
0.5
0.7
0.9
1
85/258 69/232 55/186 57/186 48/169 47/189 31/137
62/207 71/227 42/172 69/226 52/191 32/149 42/169
67/219 71/225 45/179 47/176 46/178 36/156 42/175
54/191 55/198 55/193 33/151 46/176 38/160 52/187
60/205 52/191 47/183 45/175 30/153 41/163 33/148
78/241 50/187 53/196 32/146 59/207 37/161 51/185
79/247 59/200 55/191 36/159 38/164 87/260 28/147
(e) Numerical results for CGH
ρ 0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
t
• From the viewpoint of the number of iterations and the number of function evaluations: From Tables 3–12, we see that ν in CGLM should be taken close to 0 if CGGN performs well. See Tables 3, 6, 7 and 9 for example. On the other hand, CGGN and CGLM with ν = 0.001 perform poorly for large residual problems (see Tables 11 and 12). CGSQN and CGH are efficient if we make a suitable choice of the parameters. Especially, these methods perform very well in Tables 3, 7 and 9. For some problems, the numbers of iterations of CGSQN and CGH decrease as both the parameters ρ and t approach 1 (see Tables 3, 8 and 10–12). In Table 4, we do not observe any tendency of the numerical behaviors for the change of the parameters ρ and t. In Table 5, CGSQN and CGH are not affected by the choice of the parameters, and give the almost same results. We expect that CGSQN and CGH with ρ = 1 contain more information of the Hessian of the objective function theoretically. In addition, CGSQN and CGH with ρ = 1, t = 1 perform well in our numerical experiments. So that, we recommend the choice ρ = 1, t = 1 for CGSQN and CGH. • From the viewpoint of CPU time for large-scale problems (see Tables 3–6 and 11): For large-scale problems, our methods needed more CPU time than other conjugate gradient methods did, because our methods need calculations of the Jacobian matrix J (xk ), the residual r (xk ) and matrix–vector products. Especially, if n and p are large and the Jacobian matrix J (xk ) is dense, then our methods may perform poorly (see Table 6). In order to enhance the efficiency of our methods, we need to devise how to calculate the Jacobian matrix J (xk ) and matrix–vector products. It is one of our further studies. Figs. 1–3 present the values of log10 |f (xk ) − f (x∗ )| at each iteration for Extended Rosenbrock function, Extended Powell Singular function, and Freudenstein and Roth function, respectively. In each figure, we show the best results of DL+, CGGN, CGLM, CGSQN and CGH, and the results of CGSQN and CGH with ρ = 1 and t = 1. In Table 4, CGSQN (best) is CGGN (best), and CGLM preformed poorly. So we omit CGSQN (best) and CGLM (best) in Fig. 2. Since in Table 12, CGGN did not converge within 1000 iterations, we omit CGGN (best) in Fig. 3. From Fig. 1, we can observe that all the methods converge superlinearly. In Fig. 2, all the methods seem to converge linearly. In Fig. 3, it seems that all the methods achieve superlinear convergence except for CGLM.
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
393
Table 11 Penalty I function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
46/222 25.927
28/154 17.956
31/161 18.798
19/112 13.042
129/294 34.601
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
31/159 18.517
17/175 20.342
19/194 22.62
28/243 28.408
23/217 25.272
21/215 25.069
29/228 26.566
(c) Numerical results for CGLM
ν
t 0
0.1
0.3
0.5
0.7
0.9
1
1
Failed
Failed
Failed
Failed
Failed
Failed
0.1 0.01
Failed 474/825 161.585 Failed
Failed Failed
Failed Failed
Failed Failed
Failed Failed
104/407 61.683
Failed
Failed
Failed
189/586 94.224 Failed 46/362 48.251 Failed
0.001
Failed Failed Failed
(d) Numerical results for CGSQN
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
0
0.1
0.3
0.5
0.7
0.9
1
414/1835 293.467 32/177 26.816 30/248 34.679 17/154 21.075 21/147 21.138 19/204 27.097 18/160 22.105
Failed
Failed
Failed
Failed
Failed
Failed
24/160 23.307 32/225 32.323 21/159 22.464 22/159 22.761 35/234 33.899 13/138 18.642
21/148 21.388 36/215 31.996 53/257 40.17 31/181 26.926 25/172 24.742 17/146 20.186
39/173 27.924 26/198 28.095 47/239 36.645 34/194 28.985 20/181 24.898 27/169 24.789
38/184 29.016 68/306 48.36 14/146 19.532 39/225 33.587 20/169 23.416 98/324 56.254
22/153 22.215 33/210 30.685 23/167 23.821 32/221 31.7 28/197 28.236 81/278 47.518
34/165 26.083 46/231 35.599 25/168 24.243 29/202 29 30/197 28.61 18/150 20.888
0
0.1
0.3
0.5
0.7
0.9
1
414/1835 293.28 28/264 36.098 21/210 28.579 28/212 30.186 20/199 27.159 25/211 29.375 19/230 30.217
Failed
Failed
Failed
Failed
Failed
Failed
33/238 34.273 31/212 30.795 20/163 22.995 21/198 27.268 26/209 29.297 28/252 34.726
63/282 45.38 32/216 31.559 28/219 30.966 21/189 26.193 22/189 26.161 26/208 29.188
42/277 40.607 46/257 38.984 30/206 29.827 63/314 48.797 23/202 28.205 29/201 28.969
26/214 30.03 44/232 35.646 25/202 28.423 42/255 38.001 26/213 29.827 65/282 45.178
41/254 37.783 28/193 27.815 25/207 29 31/225 32.09 33/206 30.342 26/198 28.065
33/258 36.536 26/222 30.966 25/181 25.974 28/220 31.013 22/188 26.052 17/170 23.057
(e) Numerical results for CGH
ρ
t
0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
In our previous numerical experiments, we used a fixed parameter ρ as already mentioned. However, it is reasonable to adaptively choose a parameter ρk instead of a fixed parameter ρ . The Gauss–Newton method possesses the quadratic convergence property in case of zero residual problems. In order to exploit this advantage, we propose the following parameter ρˆk :
ρˆk = min{ρ, kr (xk )k}. Tables 13 and 14 give numerical results of CGSQN and CGH with ρˆk for Extended Powell Singular function and Wood function, respectively. Unfortunately, in both tables, CGSQN and CGH with ρˆk could not enhance the efficiency.
394
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Table 12 Freudenstein and Roth function. (a) Numerical results for HS, PR, PR+, FR, and DY HS
PR
PR+
FR
DY
6/89
8/94
8/97
24/124
21/122
(b) Numerical results for DL+ t
0
0.1
0.3
0.5
0.7
0.9
1
6/93
6/93
7/93
7/93
8/94
9/100
9/100
0
0.1
0.3
0.5
0.7
0.9
1
8/97 41/160 381/885 Failed
10/105 39/156 Failed Failed
14/121 31/139 329/858 Failed
10/99 25/126 Failed 10/100
11/102 9/96 196/555 39/152
11/102 13/98 12/103 44/172
9/97 13/102 21/120 92/244
0
0.1
0.3
0.5
0.7
0.9
1
Failed 75/230 25/126 15/107 11/96 9/96 6/90
Failed 73/225 23/123 15/104 11/100 8/93 6/89
Failed 20/130 22/120 15/106 11/98 7/90 5/86
Failed 39/154 20/117 14/103 10/96 8/99 5/87
Failed 23/115 18/112 11/98 10/97 7/91 6/89
Failed 13/102 14/101 13/103 10/97 7/91 6/89
Failed 10/97 14/104 13/100 10/97 7/91 6/89
0
0.1
0.3
0.5
0.7
0.9
1
Failed 75/229 25/128 16/111 9/95 8/92 7/90
Failed 73/224 25/128 14/105 10/96 8/92 7/89
Failed 55/187 19/115 14/103 11/97 7/94 6/88
Failed 35/147 17/108 11/103 10/96 7/90 6/88
Failed 33/146 16/102 12/99 10/96 7/90 7/90
Failed 11/98 13/102 12/97 10/97 7/91 6/89
Failed 8/92 12/101 12/100 11/96 7/91 6/90
(c) Numerical results for CGLM
ν 1 0.1 0.01 0.001
t
(d) Numerical results for CGSQN
ρ 0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
t
(e) Numerical results for CGH
ρ 0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
t
Fig. 1. Extended Rosenbrock function.
6. Conclusions In this paper, we have derived the new conjugacy conditions by combining the idea of Dai and Liao [20] and the structured secant conditions and we have proposed the nonlinear conjugate gradient methods with (3.17), (3.18), (3.19), and (3.20).
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
395
Table 13 Extended Powell Singular function (ρk = ρˆ k ). (a) Numerical results for CGSQN
ρ 0(CGGN) 0.1 0.3 0.5 0.7 0.9 1
t 0
0.1
0.3
0.5
0.7
0.9
1
97/303 29.374 68/240 21.622 72/222 20.795 100/307 29.359 124/346 34.835 141/401 39.515 84/250 23.993
211/570 59.499 125/347 35.147 145/402 40.279 79/266 24.538 65/240 21.045 109/317 31.091 104/331 31.278
149/398 41.465 230/606 63.351 134/404 39.796 92/288 27.471 108/324 31.434 50/207 17.191 177/520 51.105
113/346 33.914 73/264 23.556 242/669 68.796 98/304 29.001 122/354 34.57 378/876 96.579 271/718 74.397
48/191 16.395 181/541 53.415 92/298 28.049 179/502 50.825 142/403 40.216 108/319 30.654 202/556 56.518
202/557 57.299 127/382 37.299 79/261 24.196 136/410 40.154 60/212 18.751 141/403 39.733 83/266 24.414
110/338 33.119 101/309 29.703 92/297 28.174 115/355 34.335 203/528 54.709 45/186 15.366 134/388 37.923
(b) Numerical results for CGH
ρ
t 0
0.1
0.3
0.5
0.7
0.9
1
0 (CGGN)
97/303 29.39 90/286 27.269 89/284 26.832 119/371 35.724 421/1140 116.922 87/275 25.724 107/331 31.652
211/570 59.608 89/313 28.47 171/502 49.92 177/503 50.529 50/175 15.428 93/301 28.221 200/563 56.815
149/398 41.465 81/269 25.131 215/580 60.013 100/311 29.796 95/305 28.595 105/349 32.401 197/563 56.301
113/346 33.883 181/489 50.341 201/560 57.298 192/514 53.368 128/390 37.721 145/419 41.434 102/306 29.375
48/191 16.427 223/607 62.79 106/312 30.67 214/557 58.359 99/305 29.125 118/356 34.461 325/853 89.248
202/557 57.704 146/424 42.12 131/379 37.549 188/495 51.496 121/348 34.273 148/433 42.697 97/323 29.999
110/338 33.072 175/490 49.436 133/397 39.125 145/433 42.432 97/299 28.314 130/402 38.595 136/374 37.94
0.1 0.3 0.5 0.7 0.9 1
Table 14 Wood function (ρk = ρˆ k ). (a) Numerical results for CGSQN
ρ
t 0
0.1
0.3
0.5
0.7
0.9
1
0(CGGN) 0.1 0.3 0.5 0.7 0.9 1
76/209 84/212 70/206 43/154 83/222 93/243 81/221
79/213 94/238 68/202 43/154 73/209 73/210 79/217
81/208 88/226 69/205 42/151 82/216 69/206 83/226
78/207 88/229 69/205 42/151 83/218 89/243 79/212
74/203 82/211 67/199 42/151 81/221 107/269 107/250
102/242 80/213 70/204 41/149 74/202 71/204 89/225
61/180 84/222 69/203 41/149 56/178 71/205 87/222
(b) Numerical results for CGH
ρ 0 (CGGN) 0.1 0.3 0.5 0.7 0.9 1
t 0
0.1
0.3
0.5
0.7
0.9
1
76/209 54/177 53/170 40/157 47/161 45/167 27/133
79/213 49/167 59/178 36/150 70/196 74/212 27/131
81/208 63/199 64/189 81/230 90/223 35/152 35/152
78/207 57/181 60/182 34/150 73/205 39/157 30/133
74/203 55/178 54/172 25/129 78/217 62/191 26/127
102/242 64/192 38/146 50/172 78/213 119/310 45/163
61/180 74/210 72/215 104/269 65/192 90/257 25/128
We emphasize that conjugate gradient methods which make use of the structure of the objective function have been hardly studied. Under two kinds of assumptions for a search direction and line search conditions, we have showed the global convergence properties of the proposed methods. Finally, we have presented numerical results to compare our methods
396
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
Fig. 2. Extended Powell Singular function.
Fig. 3. Freudenstein and Roth function.
with typical conjugate gradient methods. In numerical experiments, our methods perform very well from the viewpoint of the numbers of iterations and function evaluations if we make a suitable choice of the parameters. Our further interests are to adaptively choose the values of tk and ρk and to efficiently calculate the Jacobian matrix J (xk ) and matrix–vector products. Acknowledgements The authors would like to thank the referees for their valuable comments. The second and third authors are supported in part by the Grant-in-Aid for Scientific Research (C) 21510164 of Japan Society for the Promotion of Science. References [1] J.E. Dennis Jr., D.M. Gay, R.E. Welsch, An adaptive nonlinear least-squares algorithm, ACM Transactions on Mathematical Software 7 (1981) 348–368. [2] J.E. Dennis Jr., H.J. Martinez, R.A. Tapia, Convergence theory for the structured BFGS secant method with an application to nonlinear least squares, Journal of Optimization Theory and Applications 61 (1989) 161–178. [3] M. Al-Baali, R. Fletcher, Variational methods for non-linear least-squares, Journal of the Operational Research Society 36 (1985) 405–421. [4] J.R. Engels, H.J. Martinez, Local and superlinear convergence for partially known quasi-Newton methods, SIAM Journal on Optimization 1 (1991) 42–56. [5] H. Yabe, N. Yamaki, Local and superlinear convergence of structured quasi-Newton methods for nonlinear optimization, Journal of the Operations Research Society of Japan 39 (1996) 541–557. [6] J. Huschens, On the use of product structure in secant methods for nonlinear least squares problems, SIAM Journal on Optimization 4 (1994) 108–129. [7] H. Yabe, H. Ogasawara, Quadratic and superlinear convergence of the Huschens method for nonlinear least squares problems, Computational Optimization and Applications 10 (1998) 79–103.
M. Kobayashi et al. / Journal of Computational and Applied Mathematics 234 (2010) 375–397
397
[8] M.R. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards 49 (1952) 409–436. [9] R. Fletcher, C.M. Reeves, Function minimization by conjugate gradients, Computer Journal 7 (1964) 149–154. [10] G. Zoutendijk, Nonlinear programming, computational methods, in: J. Abadie (Ed.), Integer and Nonlinear Programming, North-Holland, Amsterdam, 1970, pp. 37–86. [11] M. Al-Baali, Descent property and global convergence of the Fletcher–Reeves method with inexact line search, IMA Journal of Numerical Analysis 5 (1985) 121–124. [12] H.W. Sorenson, Comparison of some conjugate direction procedures for function minimization, Journal of The Franklin Institute 288 (1969) 421–441. [13] J. Nocedal, S.J. Wright, Numerical Optimization, Second Edition, in: Springer Series in Operations Research, Springer-Verlag, New York, 2006. [14] M.J.D. Powell, Nonconvex minimization calculations and the conjugate gradient method, in: Lecture Notes in Mathematics, vol. 1066, Springer-Verlag, Berlin, 1984, pp. 122–141. [15] M.J.D. Powell, Convergence properties of algorithms for nonlinear optimization, SIAM Review 28 (1986) 487–500. [16] J.C. Gilbert, J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM Journal on Optimization 2 (1992) 21–42. [17] Y.H. Dai, Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property, SIAM Journal on Optimization 10 (1999) 177–182. [18] W.W. Hager, H. Zhang, A survey of nonlinear conjugate gradient methods, Pacific Journal of Optimization 2 (2006) 35–58. [19] A. Perry, A modified conjugate gradient algorithm, Operations Research 26 (1978) 1073–1078. [20] Y.H. Dai, L.Z. Liao, New conjugacy conditions and related nonlinear conjugate gradient methods, Applied Mathematics and Optimization 43 (2001) 87–101. [21] H. Yabe, M. Takano, Global convergence properties of nonlinear conjugate gradient methods with modified secant condition, Computational Optimization and Applications 28 (2004) 203–225. [22] W. Zhou, L. Zhang, A nonlinear conjugate gradient method based on the MBFGS secant condition, Optimization Methods and Software 21 (2006) 707–714. [23] J.A. Ford, Y. Narushima, H. Yabe, Multi-step nonlinear conjugate gradient methods for unconstrained minimization, Computational Optimization and Applications 40 (2008) 191–216. [24] J.Z. Zhang, N.Y. Deng, L.H. Chen, New quasi-Newton equation and related methods for unconstrained optimization, Journal of Optimization Theory and Applications 102 (1999) 147–167. [25] J.Z. Zhang, C.X. Xu, Properties and numerical performance of quasi-Newton methods with modified quasi-Newton equations, Journal of Computational and Applied Mathematics 137 (2001) 269–278. [26] D.H. Li, M. Fukushima, A modified BFGS method and its global convergence in nonconvex minimization, Journal of Computational and Applied Mathematics 129 (2001) 15–35. [27] J.A. Ford, I.A. Moghrabi, Alternative parameter choices for multi-step quasi-Newton methods, Optimization Methods and Software 2 (1993) 357–370. [28] J.A. Ford, I.A. Moghrabi, Multi-step quasi-Newton methods for optimization, Journal of Computational and Applied Mathematics 50 (1994) 305–323. [29] Y.H. Dai, J.Y. Han, G.H. Liu, D.F. Sun, H.X. Yin, Y. Yuan, Convergence properties of nonlinear conjugate gradient methods, SIAM Journal on Optimization 10 (1999) 345–358. [30] J.J. Moré, B.S. Garbow, K.E. Hillstrom, Testing unconstrained optimization software, ACM Transactions on Mathematical Software 7 (1981) 17–41. [31] J.J. Moré, D.J. Thuente, Line search algorithms with guaranteed sufficient decrease, ACM Transactions on Mathematical Software 20 (1994) 286–307. [32] J. Nocedal, http://www.ece.northwestern.edu/~nocedal/software.html.