Accepted Manuscript A modified quasi-Newton method for nonlinear equations Xiaowei Fang, Qin Ni, Meilan Zeng
PII: DOI: Reference:
S0377-0427(17)30324-2 http://dx.doi.org/10.1016/j.cam.2017.06.024 CAM 11202
To appear in:
Journal of Computational and Applied Mathematics
Received date : 28 August 2016 Revised date : 22 May 2017 Please cite this article as: X. Fang, Q. Ni, M. Zeng, A modified quasi-Newton method for nonlinear equations, Journal of Computational and Applied Mathematics (2017), http://dx.doi.org/10.1016/j.cam.2017.06.024 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
A Modified Quasi-Newton Method for Nonlinear EquationsI Xiaowei Fanga,b,∗, Qin Nib , Meilan Zengb,c a
Department of Mathematics, Huzhou University, Huzhou Zhejiang 313000, China College of Sciences,Nanjing University of Aeronautics and Astronautics, Nanjing Jiangsu 210016, China c College of Mathematics and Statistics, Hubei Engineering University, Xiaogan Hubei 432000, China b
Abstract In this paper, a modified quasi-Newton method is proposed for solving the nonlinear equation F (x) = 0, which is based on a new quasi-Newton approach. The usual quasi-Newton equation is Bk+1 sk = yk , where sk = xk+1 − xk , yk = F (xk+1 ) − F (xk ). The new quasi-Newton equation is Bk+1 s˜k = y˜k , in which s˜k is based on the iterates xk+1 , xk , xk−1 and y˜k is based on the function values F (xk+1 ), F (xk ), F (xk−1 ). The new quasi-Newton equation exploits additional information by assuming a quadratic relationship between the information from the last three iterates. The modified quasiNewton method is based on the new quasi-Newton equation, and possess local superlinear convergence properties. Numerical experiments show that the modified quasi-Newton method is promising. Keywords: nonlinear equations, new quasi-Newton equation, modified quasi-Newton method, local superlinear convergence 2010 MSC: , 65H10, 90C53
I This work was supported by the National Natural Science Foundation of China (11071117, 11274109), the Natural Science Foundation of Jiangsu Province (BK20141409) and the Natural Science Foundation of Huzhou University (KX21072). ∗ Corresponding author Email addresses:
[email protected] (Xiaowei Fang),
[email protected] (Qin Ni ),
[email protected] (Meilan Zeng )
Preprint submitted to Elsevier
July 12, 2017
1. Introduction Let F : Rn → Rn be continuously differentiable. Consider the system of nonlinear equations: F (x) = 0, x ∈ Rn . (1)
There are numerous iterative algorithms to solve the system of equations shown in Equation (1), such as Newton’s method [1, 2, 3], quasi-Newton methods [4, 5, 6, 7, 8, 9, 10], spectral method [11, 12, 13] and tensor method [14, 15]. Newton’s method is a frequently used iterative method for solving nonlinear equations because it converges rapidly from any sufficiently good starting position. The main drawback of Newton’s method is that the direct computation of the Jacobian is computationally expensive. This fact motivated the development of quasi-Newton method, which generates iterate xk+1 by Bk (xk+1 − xk ) = F (xk ),
(2)
where the matrix Bk is an approximation of the Jacobian matrix J(xk ). When the analytically computed Jacobian J(xk ) is used in place of Bk , the original Newton’s method is recovered. One possible way to approximate the Jacobian matrix J(xk ) is to require that Bk+1 satisfies the following set of equations: Bk+1 (xk+1 − xi ) = F (xk+1 ) − F (xi ),
for max{k − n + 1, 0} ≤ i ≤ k (3)
If k ≥ n − 1 and xk+1 − xk , xk+1 − xk−1 , · · · , xk+1 − xk−n+1 are linearly independent, then the n2 unknown elements of Bk+1 are uniquely determined by the n2 equations (3). Unfortunately, this strategy is not successful in practice [16]. The frequently used strategy is to require that Bk+1 satisfies the secant equation Bk+1 sk = yk , (4) where sk = xk+1 − xk , yk = F (xk+1 ) − F (xk ). For the quasi-Newton equation (4), Broyden [4] introduced Broyden’s good method (BGM), which has seen successful use in geophysical prospecting[17] and statistics[18]. The update formula of Broyden’s good method is Bk+1 = Bk +
(yk − Bk sk )sTk . sTk sk 2
(5)
Broyden [4] also proposed Broyden’s bad method (BBM), for this method Bk+1 = Bk +
(yk − Bk sk )ykT Bk . ykT Bk sk
(6)
BGM performed significantly better than BBM on a small set of test problems [5]. According to the work of Broyden, Powell [6] proposed a Broyden-like update formula, and provided a fortran subroutine for solving systems of nonlinear equations. Dennis [7] attempted to motivate and justify quasiNewton methods as useful modifications of Newton’s method for general and gradient nonlinear systems of equations. Mart´ınez [8] proposed combined good-bad method (CGBM) based on a hybrid approach of BGM and BBM. At each iteration, CGBM chose to apply either BGM or BBM. The update is that (yk − Bk sk )sTk |sTk sk−1 | |ykT yk−1 | B + < if , k sTk sk |sTk (Bk )−1 yk | ykT yk Bk+1 = (7) (yk − Bk sk )ykT Bk else. Bk + ykT Bk sk
Experiments of a set of small problems showed that the iteration number of CGBM performs better than both BGM and BBM. Mart´ınez [9] further discussed some practical and theoretical perspectives of quasi-Newton methods. For unconstrained optimization, Zhang et al. [19] used a quadratic interpolation to the gradient of the objective function, and proposed a new quasiNewton equation. Zhang [20] presented a class of modified quasi-Newton equations with a vector parameter which use both available gradient and function value information. Wei et al. [21] proposed a class of new quasiNewton equations using the third order Taylor formula of the objective function. According to the work of Wei et al., Biglari et al. [22] proposed a new quasi-Newton method using a fourth order tensor model. Motivated by the ideas in [19, 20, 21, 22] and the update formula of Broyden’s good method (5), we propose a modified quasi-Newton method for solving nonlinear equations. This method is based on a new quasi-Newton equation which exploits additional information by assuming a relationship between the information from the last three iterates. Under mild assumptions, the local superlinear convergence of this method is obtained. Numerical experiments show that the proposed method is promising. 3
The paper is organized as follows. Section 2 introduces the new quasiNewton equation and the modified quasi-Newton method. The convergence analysis for the proposed method is presented in Section 3. In Section 4, we report the numerical results. Conclusions are drawn in Section 5. 2. New quasi-Newton equation and modified quasi-Newton method In this section, we establish a new quasi-Newton equation. Based on the new quasi-Newton equation, a modified quasi-Newton method is proposed for solving a system of nonlinear equations. Let sk = xk+1 − xk , sk−1 = xk − xk−1 , Jk = J(xk ), where J(xk ) is the Jacobian matrix at the iterate xk . Assume that x(τ ) = xk +
τ (τ − ksk k) τ (τ + ksk−1 k) sk − sk−1 , ksk k(ksk k + ksk−1 k) ksk−1 k(ksk k + ksk−1 k)
then x(ksk k) = xk+1 , x(0) = xk , x(−ksk−1 k) = xk−1 . It follows that dF (x(τ )) dτ τ =ksk k 2τ + ksk−1 k 2τ − ksk k = J(x(τ )) sk − sk−1 ksk k(ksk k + ksk−1 k) ksk−1 k(ksk k + ksk−1 k) τ =ksk k ksk k 2ksk k + ksk−1 k = Jk+1 sk − sk−1 . (8) ksk k(ksk k + ksk−1 k) ksk−1 k(ksk k + ksk−1 k) We use a quadratic function, w(τ ) = aτ 2 + bτ + c,
a, b, c ∈ Rn ,
(9)
to approximate F (x(τ )). The approximate function w(τ ) is required to satisfy the following conditions: w(0) = F (x(0)) = F (xk ) = Fk ,
(10)
w(ksk k) = F (x(ksk k)) = F (xk+1 ) = Fk+1 ,
(11)
w(−ksk−1 k) = F (x(−ksk−1 k)) = F (xk−1 ) = Fk−1 .
(12)
4
The conditions (10)-(12) can be rewritten as c = Fk , aksk k2 + bksk k + c = Fk+1 , aksk−1 k2 − bksk−1 k + c = Fk−1 . Hence, we have a= b= where
yk ksk−1 k − yk−1 ksk k , ksk kksk−1 k(ksk k + ksk−1 k)
yk ksk−1 k2 + yk−1 ksk k2 , ksk kksk−1 k(ksk k + ksk−1 k)
yk = Fk+1 − Fk , yk−1 = Fk − Fk−1 .
(13) (14) (15)
By (9), (13) and (14), we have w0 (τ )τ =ksk k = 2aksk k + b
yk ksk−1 k(2ksk k + ksk−1 k) − yk−1 ksk k2 ksk kksk−1 k(ksk k + ksk−1 k) yk (2ksk k + ksk−1 k) yk−1 ksk k = − . ksk k(ksk k + ksk−1 k) ksk−1 k(ksk k + ksk−1 k)
=
Let w0 (τ )τ =ksk k ≈
(16)
dF (x(τ )) . dτ τ =ksk k
Then, by (8) and (16), we have ksk k2 ksk k2 Jk+1 sk − sk−1 ≈ yk − yk−1 . ksk−1 k(2ksk k + ksk−1 k) ksk−1 k(2ksk k + ksk−1 k) (17) Let s˜k = sk − δk sk−1 , y˜k = yk − δk yk−1 , (18) where δk =
ksk k2 . ksk−1 k(2ksk k + ksk−1 k)
(19)
Then equation (17) can be rewritten as
Jk+1 s˜k ≈ y˜k .
5
(20)
If Bk+1 is the approximation of Jk+1 , a new quasi-Newton equation is established as follows Bk+1 s˜k = y˜k . (21) If we let δk = 0, then (21) is reduced to the usual quasi-Newton equation. Based on the new quasi-Newton equation, we propose the modified quasiNewton algorithm. Algorithm 2.1 modified quasi-Newton method (MQNM) Step 0 : Initializations. Choose initial point x0 ∈ Rn , B0 ∈ Rn×n , k = 0. Step 1 : Determination of the next iterate. Compute Fk and solve Bk sk = −Fk for sk . Set xk+1 = xk + sk . Step 2 : Update the Jacobian approximation. Evaluate s˜k , yk , y˜k from (15) and (18). Compute Bk+1 by (yk − Bk sk )sTk if k = 0, B + k sTk sk (22) Bk+1 = (˜ yk − Bk s˜k )˜ sTk if k ≥ 1. Bk + s˜Tk s˜k
Step 3 : Check the stopping condition. If the stopping condition is not met, then set k = k + 1 and go to step 1, otherwise output xk , Fk and stop.
3. Local convergence analysis In this section we investigate the local convergence behavior of modified quasi-Newton Algorithm based on the new quasi-Newton equation. We show that if x0 is sufficiently close to a root x∗ , where J(x∗ ) is nonsingular, and if B0 is sufficiently close to J(x∗ ), then the sequence of iterates {xk } converges superlinearly to x∗ . Lemma 3.1. Let k·k denote the l2 vector norm. If ksk k = 6 0 and ksk−1 k = 6 0, then we have ksk k ≥ 2δk ksk−1 k, and
1 3 ksk k ≥ k˜ sk k ≥ ksk k. 2 2 6
Proof. From (19), we get ksk k = δk
ksk kksk−1 k(2ksk k + ksk−1 k) ≥ 2δk ksk−1 k. ksk k2
(23)
Using (23), we obtain 1 k˜ sk k = ksk − δk sk−1 k ≥ ksk k − δk ksk−1 k ≥ ksk k, 2 and
3 k˜ sk k = ksk − δk sk−1 k ≤ ksk k + δk ksk−1 k ≤ ksk k. 2
Lemma 3.2. ([16])Let F : Rn → Rn be continuously differentiable in the open convex set D ⊆ Rn , and let J(x) ∈ Lipγ (D), x ∈ D. Then, for any v, u ∈ D, kF (v) − F (u) − J(x)(v − u)k ≤
γ [kv − xk + ku − xk]kv − uk. 2
Lemma 3.3. Let D ⊆ Rn be an open convex set containing xk−1 , xk , xk+1 , with xk−1 6= x∗ and xk 6= x∗ . Let F : Rn → Rn be continuously differentiable in set D, J(x) ∈ Lipγ (D), Bk ∈ Rn×n , Bk+1 is defined by (22). If ksk k2 6= 0, ksk−1 k2 6= 0, x∗ ∈ D. Then for either the l2 or Frobenius matrix norm, γ kEk k + (kek+1 k2 + kek k2 ) if k = 0, 2 (24) kEk+1 k ≤ kE k + γ (2ke k + 3ke k + ke k ) if k ≥ 1, k k+1 2 k 2 k−1 2 2 where
Ej = Bj − J(x∗ ), j = k, k + 1; ei = xi − x∗ , i = k − 1, k, k + 1.
(25)
Proof. Let J∗ = J(x∗ ). Subtracting J∗ from both sides of (22), we have two cases.
7
Case 1: k ≥ 1. (˜ yk − Bk s˜k )˜ sTk s˜Tk s˜k (J∗ s˜k − Bk s˜k )˜ sTk (˜ yk − J∗ s˜k )˜ sTk = Ek + + s˜Tk s˜k s˜Tk s˜k yk − δk yk−1 − J∗ (sk − δk sk−1 ) s˜Tk s˜k s˜Tk = Ek I − T + s˜k s˜k s˜Tk s˜k y − J s − δ (y − J s ) s˜Tk T k ∗ k k k−1 ∗ k−1 s˜k s˜k = Ek I − T + . s˜k s˜k s˜Tk s˜k Ek+1 = Ek +
For either the l2 or Frobenius matrix norm, we have
kyk − J∗ sk k2 + δk kyk−1 − J∗ sk−1 k2 s˜k s˜Tk
+ . kEk+1 k ≤ kEk k I − T
k˜ sk k 2 s˜k s˜k 2
Since I −
s˜k s˜T k s˜T ˜k ks
(26)
(27)
is a Euclidean projection matrix, then
T
s ˜ s ˜ k k
= 1.
I −
s˜Tk s˜k 2
(28)
From Lemmas 3.1 and 3.2, we have
kyk − J∗ sk k2 = kFk+1 − Fk − J∗ sk k2 ≤
γ (kek+1 k2 + kek k2 )ksk k2 , 2
(29)
γδk (kek k2 + kek−1 k2 )ksk−1 k2 2 γ ≤ (kek k2 + kek−1 k2 )ksk k2 .(30) 4
δk kyk−1 − J∗ sk−1 k2 = δk kFk − Fk−1 − J∗ sk−1 k2 ≤
By (27)-(30) and Lemma 3.1, we have γ kEk+1 k ≤ kEk k + (2kek+1 k2 + 3kek k2 + kek−1 k2 ). 2
(31)
Case 2: k = 0. We can prove γ kEk+1 k ≤ kEk k + (kek+1 k2 + kek k2 ). 2 8
(32)
The proof of this Case is similar to that of Case 1, and is omitted. From (31) and (32), we have (24). Thus the proof of this Lemma is complete. Lemma 3.4. ([16])Let A, B ∈ Rn×n . Let k · k be any norm on Rn×n that obeys kABk ≤ kAkkBk and kIk = 1. If A is nonsingular and kA−1 (B − A)k < 1, then B is nonsingular and kB −1 k ≤
kA−1 k . 1 − kA−1 (B − A)k
(33)
Theorem 3.5. Let F : Rn → Rn be continuously differentiable in the open convex set D ⊆ Rn . Assume that ksk k2 6= 0, ksk−1 k2 6= 0, and there exist x∗ ∈ Rn and r, β > 0, such that N (x∗ , r) ⊂ D, F (x∗ ) = 0, J(x∗ )−1 exist with kJ(x∗ )−1 )k2 ≤ β, J ∈ Lipγ (N (x∗ , r)). Then there exist positive constants ε, δ such that if kx0 − x∗ k2 ≤ ε ≤ r and kB0 − J(x∗ )kF ≤ δ, and the sequence {xk } generated by Algorithm 2.1 is well defined and converges at least linearly to x∗ . Proof. Let k · k denote the l2 vector norm or Frobenius matrix norm. Let J∗ = J(x∗ ), and choose ε, δ such that 6βδ ≤ 1, 6γε ≤ δ.
(34)
kEk k ≤ (2 − 2−k )δ,
(35)
We show by induction that
kek k , (36) 2 for k = 0, 1, · · · . In brief, the first inequality is proven at each iteration using (24), which gives γ kEk k + (kek+1 k + kek k) if k = 0, 2 kEk+1 k ≤ (37) kE k + γ (2ke k + 3ke k + ke k) if k ≥ 1. k k+1 k k−1 2 kek+1 k ≤
For k = 0, (35) is trivially true. The proof of (36) is identical to the proof at the induction step, so we omit it here.
9
For k = 1, from two induction hypotheses, (34), (37) and ke0 k ≤ ε, we have that γ kE1 k ≤ kE0 k + (ke1 k + ke0 k) 2 3 ≤ δ + γε 4 < (2 − 2−1 )δ.
(38)
Now assume that (35) and (36) hold for k = 0, 1, · · · , i − 1(i ≥ 2). For k = i, we have from (37), and the two induction hypotheses that γ kEi k ≤ kEi−1 k + (2kei k + 3kei−1 k + kei−2 k) 2 3 ≤ (2 − 2−(i−1) )δ + γkei−2 k. 2
(39)
From (36) and ke0 k ≤ ε, we get kei−2 k ≤ 2−(i−2) ke0 k ≤ 2−(i−2) ε. Substituting this into the inequality (39) and using (34), we have 3 kEi k ≤ (2 − 2−(i−1) )δ + 2−(i−2) γε 2 ≤ (2 − 2−(i−1) + 2−i )δ = (2 − 2−i )δ,
(40)
which verifies (35). To verify (36), we must first show that Bi is invertible so that the iteration is well defined. From kJ∗−1 k2 ≤ β, (34) and (35), it follows that 1 kJ∗−1 Ei k2 ≤ kJ∗−1 k2 kEi k ≤ β(2 − 2−i )δ ≤ 2βδ ≤ , 3 so we have from Lemma 3.4 that Bi is nonsingular and kBi−1 k2 ≤
kJ∗−1 k2 3β ≤ . −1 2 1 − kJ∗ Ei k2
Thus xi+1 is well defined. By step 1 of Algorithm 2.1, we have xi+1 = xi − Bi−1 F (xi ). 10
(41)
From F (x∗ ) = 0, we have Bi (xi+1 − x∗ ) = Bi (xi − x∗ ) − F (xi ) + F (x∗ ), or Bi ei+1 = [−F (xi ) + F (x∗ ) + J∗ ei ] + Ei ei . Taking norm, from (41), (35) and Lemma 3.2, it follows h i −1 ∗ kei+1 k ≤ kBi k2 k − F (xi ) + F (x ) + J∗ ei k + kEi kkei k i 3 hγ ≤ β kei k + (2 − 2−i )δ kei k. 2 2
(42)
By (36), (34) and ke0 k ≤ ε, we obtain
γ 1 kei k ≤ 2−(i+1) γε ≤ 2−i δ, 2 12 which, substituted into the inequality (42), gives 3 1 −i −i kei+1 k ≤ β 2 + (2 − 2 ) δkei k 2 12 ≤ 3βδkei k 1 ≤ kei k. 2
(43)
This prove (36) and completes the proof of linear convergence.
Lemma 3.6. ([16])Let s ∈ Rn be nonzero, E ∈ Rn×n , and let k · k denote the l2 vector norm. Then "
2 # 12 2 T
ss kEsk 1 kEsk 2
E(I −
) = kEkF − ≤ kEkF − .
sT s F ksk 2kEkF ksk
Lemma 3.7. Let all the hypotheses of Theorem 3.5 hold, then for the l2 vector norm, kEk s˜k k lim = 0, (44) k→∞ k˜ sk k
11
where Ek is defined by (25). Proof. Let k · k denote the l2 vector norm or Frobenius matrix norm. From (26), when k ≥ 1, we have
T
yk − J∗ s˜k k s ˜ s ˜ k k
k˜ kEk+1 k ≤ . (45)
Ek (I − s˜T s˜k ) + k˜ sk k k In the proof of Lemma 3.3, we show that
γ k˜ yk − J∗ s˜k k ≤ (2kek+1 k + 3kek k + kek−1 k). k˜ sk k 2
(46)
Using this, along with kek+1 k ≤ 21 kek k ≤ 41 kek−1 k from (36) and Lemma 3.6 in (45), we have kEk+1 k ≤ kEk k − or
3 kEk s˜k k2 + γkek−1 k, 2 2kEk kk˜ sk k 2
3 kEk s˜k k2 ≤ 2kEk k kEk k − kEk+1 k + γkek−1 k . k˜ sk k2 2
(47)
From the proof of Theorem 3.5, it follows that kEk k ≤ 2δ for all k ≥ 0 ,and ∞ X k=0
kek k ≤ 2ε.
(48)
Thus by (47), we have kEk s˜k k2 3 ≤ 4δ kEk k − kEk+1 k + γkek−1 k , k˜ sk k 2 2
(49)
and summing the left and right sides of the above inequality for k = 1, 2, · · · , i, we have # " i i X kEk s˜k k2 3 X ≤ 4δ kE1 k − kEi+1 k + γ kek−1 k 2 k˜ s 2 kk k=1 k=1 (50) ≤ 4δ [kE1 k + 3γε] ≤ 8δ 2 , 12
where the second and third inequalities are from (34), (35) and (48). So for any i ≥ 1, ∞ X kEk s˜k k2 k˜ sk k 2 k=1 is finite, this implies (44). Thus the proof of this Lemma is complete.
Lemma 3.8. Let all the hypotheses of Theorem 3.5 hold, then for the l2 vector norm, lim kEk sk−1 k = lim kEk−1 sk−1 k. (51) k→∞
k→∞
Proof. Let k · k denote the l2 vector norm or Frobenius matrix norm. From (35), we have kEk k + kEk−1 k ≤ (2 − 2−k )δ + (2 − 2−(k−1) )δ ≤ 4δ. From (36) and ke0 k ≤ ε, we have 3 ksk−1 k ≤ kxk − x k + kxk−1 − x k ≤ kek−1 k ≤ 3 2 ∗
∗
k 1 ε. 2
(52)
(53)
Using (52) and (53), we obtain kEk k + kEk−1 k ksk−1 k k→∞ k 1 ≤ lim 12 δε k→∞ 2 = 0.
lim kEk sk−1 − Ek−1 sk−1 k ≤ lim
k→∞
(54)
This implies (51). Theorem 3.9. Let all the hypotheses of Theorem 3.5 hold, then the sequence {xk } generated by Algorithm 2.1 converges superlinearly to x∗ . Proof. Let k · k denote the l2 vector norm. We first prove kEk sk k = 0. k→∞ ksk k
(55)
kEk sk k kEk (˜ sk + δk sk−1 )k kEk s˜k k kEk sk−1 k = ≤ + δk . ksk k ksk k ksk k ksk k
(56)
lim
From (18),
13
From Lemma 3.1, we have 3 kEk s˜k k kEk s˜k k ≤ , ksk k 2 k˜ sk k and δk
kEk sk−1 k 1 kEk sk−1 k ≤ . ksk k 2 ksk−1 k
(57)
(58)
Substituting (57) and (58) into the inequality (56), we have kEk sk k 3 kEk s˜k k 1 kEk sk−1 k ≤ + . ksk k 2 k˜ sk k 2 ksk−1 k
(59)
Using (44) and (51), we have kEk sk k 1 kEk−1 sk−1 k ≤ lim . k→∞ ksk k 2 k→∞ ksk−1 k
(60)
kEk sk k = 0. k→∞ ksk k
(61)
kxk+1 − x∗ k = 0. k→∞ kxk − x∗ k
(62)
lim
This implies lim
Using (61), we obtain
lim
This implies Algorithm 2.1 possess local superlinear convergence properties. The proof of (62) is similar to that in [16], and is omitted. 4. Numerical experiments In this section, we apply Algorithm 2.1 to solve nonlinear equations. In the test results, Algorithm 2.1 is denoted by MQNM. We compare MQNM with three other quasi-Newton methods, which is similar to MQNM. The only difference between MQNM and the other three quasi-Newton methods is the form of the update of Bk+1 , and we discuss in the following: 1. Broyden’s Good Method (BGM), using the update(5). 2. Broyden’s Bad Method (BBM), using the update(6). 3. Combined Good-Bad Method (CGBM), using the update(7) Our tests are performed on a PC with Intel Core Duo CPU (
[email protected], 3.60GHz) and 8 GB RAM, with MATLAB 7.12.0. We test problems 1-22 14
Table 1: The result of small scale problems
Problem Dim
BGM Iter Time
TP1 2 4 0.0005 TP2 2 12 0.0013 TP3 2 28 0.0014 TP4 2 ** ** TP5 2 ** ** TP6 3 ** ** TP7 3 ** ** TP8 3 13 0.0015 TP9 4 39 0.0037 TP10 4 ** ** TP11 6 ** ** TP12 10 6 0.009 TP13 8 39 0.0041 TP14 10 21 0.0018 TP15 3 16 0.0013 TP16 4 10 0.001 TP17 10 6 0.0009 TP18 10 13 0.0034 TP19 9 21 0.0026 TP20 10 31 0.0028 TP21 10 1 0.0004 TP22 2 10 0.009 TP23 20 10 0.0014 TP24 20 31 0.0032 TP25 20 11 0.0009 TP26 10 14 0.0015 TP27 20 4 0.008 TP28 10 33 0.0031 TP29 20 49 0.0057 ** Means that the method failed.
BBM Iter Time
CGBM Iter Time
MQNM Iter Time
4 11 54 ** ** ** ** 18 39 ** ** 5 39 21 16 20 6 13 17 27 1 10 10 28 15 13 4 33 49
4 11 27 ** ** ** ** 13 39 ** ** 5 39 21 16 11 6 14 17 27 1 10 10 29 11 13 4 33 49
4 12 35 ** ** ** ** 12 28 ** ** 5 28 17 18 16 6 13 17 29 1 8 9 30 9 17 4 24 34
15
0.0007 0.0011 0.0038 ** ** ** ** 0.0018 0.0038 ** ** 0.0008 0.0042 0.0037 0.0014 0.0021 0.001 0.0038 0.0021 0.0035 0.0004 0.001 0.0013 0.0035 0.0016 0.0017 0.009 0.0032 0.0059
0.0009 0.0016 0.0031 ** ** ** ** 0.0016 0.0053 ** ** 0.001 0.0059 0.0041 0.0016 0.0018 0.0013 0.0047 0.002 0.004 0.0004 0.0013 0.0021 0.0056 0.0019 0.0018 0.0011 0.0035 0.0071
0.0005 0.0013 0.0022 ** ** ** ** 0.0012 0.0025 ** ** 0.0008 0.0028 0.0014 0.0018 0.0016 0.001 0.0032 0.0018 0.0027 0.0004 0.0007 0.0011 0.0033 0.0009 0.0021 0.0008 0.002 0.0033
Table 2: The result of medium scale problems
Problem Dim
BGM Iter Time
TP12 100 6 0.0061 TP12 500 6 0.0605 TP12 1000 6 0.2438 TP13 100 40 0.0358 TP13 500 41 0.3248 TP13 1000 42 1.8524 TP15 100 ** ** TP15 500 ** ** TP15 1000 ** ** TP17 100 7 0.0091 TP17 500 6 0.0692 TP17 1000 6 0.2843 TP19 100 ** ** TP19 500 ** ** TP19 1000 ** ** TP21 100 2 0.0018 TP21 500 5 0.0046 TP21 1000 4 0.1915 TP23 100 10 0.0103 TP23 500 10 0.0871 TP23 1000 14 0.5407 TP24 100 34 0.0218 TP24 500 35 0.2673 TP24 1000 35 1.3521 TP25 100 11 0.038 TP25 500 11 0.0478 TP25 1000 11 0.1994 TP26 100 18 0.0175 TP26 500 18 0.1793 TP26 1000 22 1.0851 TP27 100 3 0.0041 TP27 500 3 0.0576 TP27 1000 3 0.2751 TP28 100 28 0.0234 TP28 500 24 0.1843 TP28 1000 23 0.8801 ** Means that the method failed.
BBM Iter Time
CGBM Iter Time
MQNM Iter Time
6 6 6 40 41 42 ** ** ** 7 6 6 ** ** ** 2 5 4 10 12 12 32 32 33 19 23 24 15 17 17 3 3 3 28 24 23 16
6 6 6 40 41 42 ** ** ** 6 6 6 ** ** ** 2 5 4 10 12 13 32 32 33 11 11 11 15 19 18 3 3 3 28 24 23
6 7 7 29 30 30 ** ** ** 7 6 6 ** ** ** 2 5 4 9 9 11 30 34 35 9 9 9 15 20 19 3 3 3 20 18 17
0.0062 0.0764 0.3214 0.0372 0.6253 4.0578 ** ** ** 0.0108 0.1091 0.5947 ** ** ** 0.0019 0.0082 0.5408 0.0094 0.1646 1.119 0.0213 0.4961 3.4524 0.1052 0.2458 1.2847 0.0145 0.2665 1.6111 0.0043 0.0625 0.3022 0.0193 0.3711 2.0712
0.0073 0.0975 0.3847 0.0527 0.9825 5.6696 ** ** ** 0.0116 0.1411 0.6168 ** ** ** 0.0034 0.0112 0.4438 0.0138 0.1888 1.2878 0.0304 0.7721 5.0208 0.0791 0.1994 1.0258 0.0139 0.3609 2.2847 0.0045 0.0788 0.3526 0.0212 0.475 2.2124
0.0069 0.0956 0.3984 0.0214 0.2473 1.3428 ** ** ** 0.0098 0.0654 0.2796 ** ** ** 0.0019 0.0056 0.2123 0.0081 0.075 0.4156 0.0212 0.2495 1.3334 0.0256 0.043 0.1555 0.0157 0.188 0.9715 0.0046 0.0547 0.3012 0.0156 0.1447 0.6399
1
0.8
s
ρ (τ)
0.6
0.4
0.2
0
1
1.1
1.2
1.3 1.4 1.5 1.6 1.7 Performance profile for the number of iterates
1.8
1.9
2
1
0.8
s
ρ (τ)
0.6
0.4
BGM BBM CGBM MQNM
0.2
0
1
1.2
1.4
1.6
1.8 2 2.2 Performance profile for CPU times
2.4
2.6
2.8
Figure 1: Performance profile for small scale problems
from [23], problems 23-27 from [10] and problems 28,29 from[12]. We set B0 = J(x0 ), where x0 is initial point. The stopping criteria is either when (1) kF (xk )k ≤ 10−15 or (2) when the number of iterations exceed 1000. The test problems are summarised within the appendix. Numerical results of small scale problems is listed in Table 1, where Problem is the test problem, Dim means the dimension of the test problem, Iter stands the number of iteration, Time is the CPU execution time in seconds. In order to compare BGM, BBM, CGBM, MQNM for medium scale problems, 12 test problems are enlarged to the dimension 100, 500, 1000 respectively. Numerical results of medium scale problems is listed in Table 2. Using the performance profiles as described in Dolan and More [24], we can demonstrate the overall behavior of the BGM, BBM, CGBM and MQNM 17
1
0.8
s
ρ (τ)
0.6
0.4
0.2
0
1
1.2
1.4
1.6 1.8 2 2.2 Performance profile for the number of iterates
2.4
2.6
1
0.8
s
ρ (τ)
0.6
0.4
BGM BBM CGBM MQNM
0.2
0
1
2
3
4 5 6 Performance profile for CPU times
7
8
Figure 2: Performance profile for medium scale problems
based on the number of iterates and CPU times. The performance profile is defined that: o tp,s 1 n p ∈ P : ≤ τ ρs (τ ) = , |P | min{tp,s : s ∈ S}
where S is the set of solvers in comparison on a test set P , |P | is the number of test problems, and tp,s is the number of iterations (or the CPU times) required to solve problem p by solver s. Figures 1 and 2 are performance profiles for small scale problems and medium scale problems respectively. As we can see, MQNM guarantees better results than BGM, BBM, CGBM regarding the number of iterates and CPU times. We also observe that CGBM is more efficient than BGM and BBM for the number of iterates, but inefficient than BGM for the CPU times. 18
5. Conclusion In this paper, we present a modified quasi-Newton method based on the new quasi-Newton equation for solving nonlinear equations. Under mild assumptions, we prove that the modified quasi-Newton method possesses local superlinear convergence properties. Numerical experiments show that the modified quasi-Newton method is worth study. [1] C.G.Broyden, The convergence of an algorithm for solving sparse nonlinear systems, Math. Comput. 25 (1971) 285-294. [2] Y. Saad, M. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing. 7 (1986) 856-869. [3] J.B.Resen, J.Kreuser,Iterative Methods for Sparse Linear Systems, SIAM Publications, Philadelphia, PA, second ed, 2003. [4] C.G.Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of computation. 19 (1965) 577-593. [5] M.Al-Baali, E.Spedicato, F.Maggioni, Broyden’s quasi-Newton methods for a nonlinear system of equations and unconstrained optimization: a review and open problems, Optimization Methods and Software. 29 (2014) 937-954. [6] M.J.Powell, A FORTRAN subroutine for solving systems of nonlinear algebraic equations, Numerical Methods for Nonlinear Algebraic Equations. (P. Rabinowitzed.), Gordon and Breach, London, (1970) 115-161. [7] J.E.Dennis Jr, J.J.Mor, Quasi-Newton methods, motivation and theory, SIAM review. 19 (1977) 46-89. [8] J.M.Martinez, L.S.Ochi, Sobre dois m´etodos de broyden, Mathem´atica Aplicada e Computacional. 1 (1982)135-141. [9] J.M.Mart´ınez, Practical quasi-Newton methods for solving nonlinear systems, Journal of Computational and Applied Mathematics. 124 (2000) 97-121.
19
[10] G.Yuan, Z.Wei, X.Lu, A BFGS trust-region method for nonlinear equations, Computing. 92(2011) 317-333. [11] W.Cheng, D.H.Li. A derivative-free nonmonotone line search and its application to the spectral residual method, IMA journal of numerical analysis. 29 (2009) 814-825. [12] W.L.Cruz, M.Raydan, Nonmonotone spectral methods for large-scale nonlinear systems, Optimization Methods and Software. 18 (2003) 583599. [13] W.L.Cruz, J.Martnez, M.Raydan, Spectral residual method without gradient information for solving large-scale nonlinear systems of equations, Mathematics of Computation. 75 (2006) 1429-1448. [14] R.B.Schnabel, P.D.Frank, Tensor methods for nonlinear equations, SIAM J. Numer.Anal. 21 (1984) 815-843. [15] A.Bouaricha, R.B.Schnabel, Tensor methods for large sparse systems of nonlinear equations, Math. Program. 82 (1998) 377-400. [16] J.E.Dennis Jr, R.B.Schnabel, Numerical methods for unconstrained optimization and nonlinear equations, SIAM,1996. [17] M.H.Loke, R.D.Barker, Rapid least-squares inversion of apparent resistivity pseudosections by a quasi-Newton method, Geophysical prospecting. 44(1996) 131-152. [18] M.Jamshidian, R.I.Jennrich. Acceleration of the EM Algorithm by using Quasi-Newton Methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 59(1997) 569-587. [19] J.Z.Zhang, N.Y.Deng, L.H.Chen. New quasi-Newton equation and related methods for unconstrained optimization, Journal of Optimization Theory and applications. 102 (1999) 147-167. [20] J.Zhang, C.Xu, Properties and numerical performance of quasi-Newton methods with modified quasi-Newton equations, Journal of Computational and Applied Mathematics. 137 (2001) 269-278.
20
[21] Z.Wei, G.Li, L.Qi, New quasi-Newton methods for unconstrained optimization problems, Applied Mathematics and Computation. 175 (2006) 1156-1188. [22] F.Biglari, M.A.Hassan, W.J.Leong, New quasi-Newton methods via higher order tensor models, Journal of computational and applied mathematics.235 (2011) 2412-2422. [23] J.J.Mor´e, B.S.Garbow, K.E.Hillstrom K E, Testing unconstrained optimization software, ACM Transactions on Mathematical Software (TOMS). 7 (1981) 17-41. [24] E.D.Dolan, J.J.Mor´e, Benchmarking optimization software with performance profiles, Mathematical Programming. 91 (2002)201-213.
Appendix A. Test problems We now list the test problems with the associated starting positions. 1. Rosenbrock function:F1 (x) = (f1 (x), f2 (x))T , where f1 (x) = 10(x2 − x21 ), f2 (x) = 1 − x1 , xs = (−1.2, 1)T .
2. Freudenstein and Roth function:F2 (x) = (f1 (x), f2 (x))T , where f1 (x) = −13 + x1 + ((5 − x2 )x2 − 2)x2 , f2 (x) = −29 + x1 + ((x2 + 1)x2 − 14)x2 , xs = (6, 3)T .
3. Powell badly scaled function:F3 (x) = (f1 (x), f2 (x))T , where f1 (x) = 104 x1 x2 − 1, f2 (x) = e−x1 + e−x2 − 1.0001, xs = (0, 1)T .
21
4. Brown badly scaled function:F4 (x) = (f1 (x), f2 (x))T , where f1 (x) = x1 x22 − 2x2 + x1 − 106 , f2 (x) = x21 x2 − 2x1 + x2 − 2 · 10−6 , xs = (1, 1)T .
5. Beale function:F5 (x) = (f1 (x), f2 (x))T , where f1 (x) = 1.5 − x1 (1 − x2 ), f2 (x) = 2.25 − x1 (1 − x22 ), xs = (1, 1)T .
6. Helical valley function:F6 (x) = (f1 (x), f2 (x), f3 (x))T , where f1 (x) = 10[x3 − 10θ(x1 , x2 )], q f2 (x) = 10[ x21 + x22 − 1],
f3 (x) = x3 ,
x2 1 arctan( ) 2π x1 θ(x1 , x2 ) = x2 1 arctan( ) + 0.5 2π x1
if x1 > 0, if x1 < 0.
xs = (−1, 0, 0)T .
7. Gulf research and development function:F7 (x) = (f1 (x), f2 (x), f3 (x))T , where |y mix |x3 − i x 2 1 − ti , for i = 1, 2, 3, fi (x) = e ti =
i 100
and yi = 25 + (−50ln(ti ))2/3 ,
xs = (5, 2.5, 0.15)T . 8. Box three-dimensional function:F8 (x) = (f1 (x), f2 (x), f3 (x))T , where fi (x) = e−0.1ix1 − e−0.1ix2 − x3 (e−0.1i − e−i ), T
xs = (0, 10, 20) .
22
for i = 1, 2, 3,
9. Powell singular function:F9 (x) = (f1 (x), f2 (x), f3 (x), f4 (x))T , where f1 (x) = x1 + 10x2 , √ f2 (x) = 5(x3 − x4 ), f3 (x) = (x2 − 2x3 )2 , √ f4 (x) = 10(x1 − x4 )2 , xs = (3, −1, 0, 1)T .
10. Wood function:F10 (x) = (f1 (x), f2 (x), f3 (x), f4 (x))T , where f1 (x) = 200x1 (x21 − x2 ) + x1 − 1, f2 (x) = 100(x2 − x21 ) + 10(x2 + x4 − 2) +
f3 (x) = 180x3 (x23 − x4 ) + x3 − 1,
f4 (x) = 90(x4 − x23 ) + 10(x2 + x4 − 2) −
1 (x2 − x4 ), 10
1 (x2 − x4 ), 10
xs = (−3, −1, −3, −1)T . 11. Biggs EXP6 function:F11 (x) = (f1 (x), f2 (x), · · · , f6 (x))T , where fi (x) = x3 e−ti x1 − x4 e−ti x2 + x6 e−ti x5 − yi ,
for i = 1, 2, · · · , 6,
ti = 0.1i and yi = e−ti − 5e−10ti + 3e−4ti , xs = (1, 2, 1, 1, 1, 1)T . 12. Extended Rosenbrock function:F12 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where n for i = 1, 2, · · · , , 2 n f2i (x) = 1 − x2i−1 , for i = 1, 2, · · · , , 2 xs = (−1.2, 1, −1.2, 1, · · · , −1.2, 1)T .
f2i−1 (x) = 10(x2i − x22i−1 ),
13. Extended Powell singular function:F13 (x) = (f1 (x), f2 (x), · · · , fn (x))T ,
23
where n f4i−3 (x) = x4i−3 + 10x4i−2 , for i = 1, 2, · · · , , 4 √ n f4i−2 (x) = 5(x4i−1 − x4i ), for i = 1, 2, · · · , , 4 n 2 f4i−1 (x) = (x4i−2 − 2x4i−1 ) , for i = 1, 2, · · · , , 4 √ n 2 f4i (x) = 10(x4i−3 − x4i ) , for i = 1, 2, · · · , , 4 xs = (3, −1, 0, 1, 3, −1, 0, 1, · · · , 3, −1, 0, 1)T . 14. Variably dimensioned function:F14 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where fi (x) = xi − 1 + i
n X j=1
n X j(xj − 1) + 2i( j(xj − 1))3 , j=1
for i = 1, 2, · · · , n,
n−1 n−2 1 xs = ( , , · · · , , 0)T . n n n 15. Trigonometric function:F15 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where fi (x) = n −
n X j=1
cos xj + i(1 − cos xi ) − sin xi ,
for i = 1, 2, · · · , n,
1 1 1 x s = ( , , · · · , )T . n n n 16. Brown almost-linear function:F16 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where fi (x) = xi +
n X j=1
fn (x) = (
n Y j=1
xj − (n + 1),
for i = 1, 2, · · · , n − 1,
xj ) − 1,
1 1 1 xs = ( , , · · · , )T . 2 2 2 17. Discrete boundary function:F17 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where fi (x) = 2xi − xi−1 − xi+1 + 0.5h2 (xi + ti + 1)3 ,
h=
1 ,t n+1 i
for i = 1, 2, · · · , n,
= ih, and x0 = xn+1 = 0. xs = (t1 (t1 − 1), t2 (t2 − 2), · · · , tn (tn − 1))T . 24
18. Discrete integral equation function:F18 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where i n X X fi (x) = xi +0.5h[(1−ti ) tj (xj +tj +1)3 +ti (1−tj )(xj +tj +1)3 ], for i = 1, 2, · · · , n, j=1
h=
1 ,t n+1 i
j=i+1
= ih and x0 = xn+1 = 0.
xs = (t1 (t1 − 1), t2 (t2 − 2), · · · , tn (tn − 1))T .
19. Broyden tridiagonal function:F19 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where fi (x) = (3 − 2xi )xi − xi−1 − 2xi+1 + 1, x0 = xn+1 = 0,
for i = 1, 2, · · · , n,
xs = (−1, −1, · · · , −1)T .
20. Broyden banded function:F20 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where X xj (1 + xj ), for i = 1, 2, · · · , n, fi (x) = xi (2 + 5x2i ) + 1 − j∈Ji
Ji = {j : j 6= i, max(1, i − ml ) ≤ j ≤ min(n, i + mu )}, ml = 5, mu = 1. xs = (−1, −1, · · · , −1)T
21. Linear function-full rank:F21 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where n 2 X fi (x) = xi − ( xj ) − 1, for i = 1, 2, · · · , n, n j=1 xs = (1, 1, · · · , 1)T .
22. Chebyquad function:F22 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where Z 1 n 1X fi (x) = Ti (xj ) − Ti (x)dx, for i = 1, 2, · · · , n, n j=1 0
Ti is the ith Chebyshev polynomial shifted to the interval [0,1] and hence, Z 1 Ti (x)dx = 0, for i odd, 0 Z 1 −1 , for i even, Ti (x)dx = 2 (i − 1) 0 1 2 n T xs = ( , ,··· , ) . n+1 n+1 n+1 25
23. Logarithmic function:F23 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where xi fi (x) = ln(xi + 1) − , for i = 1, 2, · · · , n, n xs = (1, 1, · · · , 1)T .
24. Strictly convex function 1:F24 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where
fi (x) = exi − 1, for i = 1, 2, · · · , n, 1 2 xs = ( , , · · · , 1)T . n n 25. Penalty function:F25 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where √ fi (x) = 10−5 (xi − 1), for i = 1, 2, · · · , n − 1, n 1 X 2 1 fn (x) = ( ) x − , 4n j=1 j 4
1 1 1 xs = ( , , · · · , )T . 3 3 3 26. Extended Freudenstein and Roth function:F26 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where n f2i−1 (x) = x2i−1 + ((5 − x2i )x2i − 2)x2i − 13, for i = 1, 2, · · · , , 2 n f2i (x) = x2i−1 + ((1 + x2i )x2i − 14)x2i − 29, for i = 1, 2, · · · , , 2 xs = (6, 3, 6, 3, · · · , 6, 3)T . 27. The discretized two-point boundary value function: 1 F27 (x) = Ax + G(x), (n + 1)2 where A is the n × n tridiagonal matrix given by 8 −1 −1 8 −1 .. .. .. . . . , . . . . . . −1 −1 8
and G(x) = (sin x1 − 1, sin x2 − 1, · · · , sin xn − 1)t . xs = (50, 0, 50, 0, · · · , 50, 0)T . 26
(A.1)
28. Exponential function 1:F28 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where f1 (x) = ex1 −1 , fi (x) = i(exi −1 − xi ), for i = 2, 3, · · · , n, n n n T xs = ( , ,··· , ) . n−1 n−1 n−1 29. Zero Jacobian function:F29 (x) = (f1 (x), f2 (x), · · · , fn (x))T , where f1 (x) =
n X
x2j ,
j=1
fi (x) = − 2x1 xi ,
for i = 2, 3, · · · , n,
x1s = 100(n−100)/n, and for all i ≥ 2, xis = (n−1000)(n−500)/(60n)2 .
27