Mathematical and Computer Modelling 50 (2009) 15–20
Contents lists available at ScienceDirect
Mathematical and Computer Modelling journal homepage: www.elsevier.com/locate/mcm
A PRP type method for systems of monotone equationsI Wanyou Cheng College of Computer, Dongguan University of Technology, Dongguan 523000, China
article
info
Article history: Received 20 March 2007 Received in revised form 11 March 2009 Accepted 1 April 2009 Keywords: Monotone equations Hyperplane projection method Global convergence
abstract In this paper, we propose an algorithm for solving systems of monotone equations. The method is a combination of the well-known PRP method and the hyperplane projection method. We prove that the proposed method is globally convergent if the equation is monotone and Lipschitz continuous without any differentiability requirement on the equation. Preliminary numerical results show that the proposed method is promising. © 2009 Elsevier Ltd. All rights reserved.
1. Introduction In this paper, we consider the problem of finding a solution of the nonlinear system of equations F (x) = 0,
(1.1)
where F : Rn → Rn is continuous and monotone. By monotonicity, we mean
hF (x) − F (y), x − yi ≥ 0,
∀x, y ∈ Rn .
Nonlinear monotone equations have many practical background such as the first order necessary condition of the unconstrained convex optimization problem and the subproblems in the generalized proximal algorithms with Bregman distances [1]. Some monotone variational inequality problems can also be converted into systems of nonlinear monotone equations by means of fixed point maps or normal maps if the underlying function satisfies some coercive conditions [2]. Different methods have been developed for nonlinear systems of equations. Newton’s method and quasi-Newton methods [3–9] are particularly welcome because of their local superlinear convergence property. However, they are typically unattractive for large-scale nonlinear systems of equations because they need to solve a linear system using the Jacobian matrix or an approximation of it. The spectral gradient method [10] is quite easy to be implemented and very efficient for large-scale unconstrained optimization problems. Recently, Cruz and Raydan [11] extended the spectral gradient method to solve large-scale nonlinear systems of equations. They introduced a spectral algorithm (SANE) in [11]. Global convergence is guaranteed by means of a variation of the nonmonotone strategy of Grippo, Lampariello and Lucidi [12]. La Cruz, Martínez and Raydan [13] proposed a fully derivative-free SANE algorithm (DF-SANE). Numerical experiments show that DF-SANE works well for a class of nonlinear systems of equations. Zhang and Zhou [14] combined the spectral gradient method [10] with the projection method [15] to solve nonlinear monotone equations. The method in [14] is shown to be globally convergent if the nonlinear equations are Lipschitz continuous. We refer to recent review papers [16,17] for a summary of nonlinear systems of equations.
I The work was supported by the NSF project of China granted 10771057.
E-mail address:
[email protected]. 0895-7177/$ – see front matter © 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.mcm.2009.04.007
16
W. Cheng / Mathematical and Computer Modelling 50 (2009) 15–20
The conjugate gradient methods are particularly efficient for large-scale unconstrained optimization problems due to their simplicity and low storage [18]. In practical computation, conjugate gradient methods generally perform better than the spectral gradient method. However, the study of conjugate gradient methods for large-scale nonlinear systems of equations is relatively rare. In this paper, as an attempt, based on the hyperplane projection method [15], we extend the wellknown PRP method [19,20] to solve (1.1). We prove its global convergence without the requirement of the differentiability on equations. Our preliminary numerical results show that the method is promising. The paper is organized as follows. In Section 2, after simply recalling the PRP method for unconstrained optimization problems and hyperplane projection method, we present the algorithm. In Section 3, we establish the global convergence of the algorithm. We report some numerical results in the last section. Throughout the paper, we use k · k to denote the Euclidean norm of vectors. 2. Algorithm In this section, we propose a PRP type method for (1.1). The method is a combination of the well-known PRP method and the hyperplane projection method. Let us first recall the PRP method [19,20] for the unconstrained optimization problem min f (x),
x ∈ Rn ,
(2.1)
where f is smooth. The PRP method for solving (2.1) generates a sequence {xk } by xk+1 = xk + αk dk ,
k = 0, 1, . . . ,
where the scalar αk is called the steplength and is obtained by a line search, and dk is the direction determined by
dk =
−gk , −gk + βkPRP −1 dk−1 ,
if if
k = 0, k > 0,
(2.2)
where gk denotes the gradient of f at xk . The parameter βkPRP −1 in (2.2) is given by
βkPRP −1 =
gkT yk−1
, kgk−1 k2 = gk − gk−1 .
where yk−1 About the hyperplane projection method in [15], note that by the monotonicity of F , for any x¯ such that F (¯x) = 0, we have
hF (zk ), x¯ − zk i ≤ 0. Let xk be the current iterate. By performing some kind of line search procedure along a direction d¯k , a point zk = xk + αk d¯k can be computed such that
hF (zk ), xk − zk i > 0. Thus the hyperplane
Hk = {x ∈ Rn |hF (zk ), x − zk i = 0} strictly separates the current iterate xk from the zeros of Eq. (1.1). Therefore, it is reasonable to let the next iterate xk+1 be the projection of xk onto the hyperplane. Now, we extend the PRP method to solve (1.1). We state the steps of the algorithm as follows. Algorithm 2.1 (PRP Method). Step 1. Given an initial point x0 ∈ Rn and constants β ∈ (0, 1), η ∈ (0, 1), ξ ∈ (0, 1) and 0 < σmin < σmax . Given the initial steplength σ0 = 1. Let d0 = −F (x0 ) and k := 0. Step 2. Determine steplength αk = σk β mk such that mk is the smallest nonnegative integer m satisfying
− hF (xk + σk β m dk ), dk i ≥ ησk β m kF (xk + σk β m dk )kkdk k2 .
(2.3)
Let zk = xk + αk dk . Stop if kF (zk )k = 0. Step 3. Compute the projection of xk on Hk by
hF (zk ), xk − zk i F (zk ). kF (zk )k2 Stop if kF (xk+1 )k = 0. xk+1 = xk −
(2.4)
Step 4. Compute dk+1 by
dk+1 = −Fk+1 + βkPRP dk where βkPRP =
FkT+1 yk kFk k2
, yk = Fk+1 − Fk and Fk is an abbreviation of F (xk ). If > −ξ kFk+1 k2 , set dk+1 = −Fk+1 . Step 5. Choose an initial steplength σk+1 such that σk+1 ∈ [σmin , σmax ]. Let k := k + 1. Go to Step 2. dTk+1 Fk+1
(2.5)
W. Cheng / Mathematical and Computer Modelling 50 (2009) 15–20
17
Remark. It is easy to see from Step 4 of Algorithm 2.1 that
− FkT dk ≥ ξ kFk k2 .
(2.6)
Therefore after a finite number of reductions of αk , the line search condition (2.3) necessarily holds. Consequently, Algorithm 2.1 is well defined. 3. Convergence property This section is devoted to the global convergence of Algorithm 2.1. We give two preliminary lemmas first. The following lemma is from [15]. Lemma 3.1. Let F be monotone and x, y ∈ Rn satisfy hF (y), x − yi > 0. Let x+ = x −
hF (y), x − yi F (y). kF (y)k2
Then for any x¯ ∈ Rn such that F (¯x) = 0, it holds that
kx+ − x¯ k2 ≤ kx − x¯ k2 − kx+ − xk2 . Lemma 3.2. Suppose that F is Lipschitz continuous and the sequence {xk } generated by Algorithm 2.1 is bounded. If there exists a constant ε > 0 such that kF (xk )k ≥ ε for all k and lim αk kdk k = 0,
(3.1)
k→∞
then there exists a positive constant M such that
kd k k ≤ M ,
∀k.
Proof. Since F is Lipschitz continuous and the sequence {xk } is bounded, then there exists a positive constant γ1 such that for all k
kF k k ≤ γ 1 .
(3.2)
From the definition of dk in Step 4 of Algorithm 2.1, we have
kdk k ≤ kFk k + |βkPRP −1 |kdk−1 k.
(3.3)
By the Lipschitz continuity of F , using (3.1), (3.2) and the condition kF (xk )k ≥ ε , we obtain
|βkPRP −1 | ≤
Lαk−1 kFk kkdk−1 k
kFk−1 k2
→ 0 as k → ∞,
where L is the Lipschitz constant. The last inequality together with (3.3) implies that there exist a constant ε0 ∈ (0, 1) and an integer k0 such that for all k > k0 k−k0 −1
kdk k ≤ γ1 + ε0 kdk−1 k ≤ γ1 (1 + ε0 + ε02 + · · · + ε0
k−k0
) + ε0
γ
Setting M = max{kd0 k, kd1 k, . . . , 1−ε1 + kdk0 k}, we get the conclusion. 0
kdk0 k ≤
γ1 + kdk0 k. 1 − ε0
Now we establish a global convergence theorem for Algorithm 2.1. Theorem 3.1. Let {xk } be generated by Algorithm 2.1. Suppose that F is monotone and Lipschitz continuous and that the solution set of (1.1) is not empty. Then for any x¯ such that F (¯x) = 0, it holds that
kxk+1 − x¯ k2 ≤ kxk − x¯ k2 − kxk+1 − xk k2 .
(3.4)
In particular, {xk } is bounded. Furthermore, it holds that either {xk } is finite and the last iterate is a solution, or the sequence is infinite and limk→∞ kxk+1 − xk k = 0. Moreover, {xk } converges to some solution of (1.1). Proof. We first note that if the algorithm terminates at some iteration k, then kF (zk )k = 0 or kF (xk )k = 0. This means that xk or zk is a solution of Eq. (1.1). Suppose that kF (zk )k 6= 0 and kF (xk )k 6= 0 for all k. Then an infinite sequence {xk } is generated. It follows from (2.3) that
hF (zk ), xk − zk i = −αk hF (zk ), dk i ≥ ηkF (zk )kαk2 kdk k2 > 0.
(3.5)
Let x¯ be any point such that F (¯x) = 0. By (2.4), (3.5) and Lemma 3.1, we obtain
kxk+1 − x¯ k2 ≤ kxk − x¯ k2 − kxk+1 − xk k2 .
(3.6)
18
W. Cheng / Mathematical and Computer Modelling 50 (2009) 15–20
Hence the sequence {kxk − x¯ k} is decreasing and convergent. In particular, the sequence {xk } is bounded, and lim kxk+1 − xk k = 0.
k→∞
(3.7)
By (2.4) and (3.5), we obtain
kxk+1 − xk k =
hF (zk ), xk − zk i ≥ ηαk2 kdk k2 . kF (zk )k
The last inequality together with (3.7) implies lim αk kdk k = 0.
(3.8)
k→∞
We claim that limk→∞ inf kFk k = 0. In fact, if limk→∞ inf kFk k > 0 holds, from (3.8) and Lemma 3.2, then there exists a positive constant M such that
kdk k ≤ M ,
∀k.
(3.9)
Applying Cauchy–Schwarz inequality to (2.6), we obtain kdk k ≥ ξ kFk k. So, we have lim inf kdk k > 0.
k→∞
This together with (3.8) means limk→∞ αk = 0. By the line search rule, we have for all k sufficiently large, σk β mk −1 will not satisfy (2.3). This means
− hF (xk + σk β mk −1 dk ), dk i < ησk β mk −1 kF (xk + σk β mk −1 dk )kkdk k2 .
(3.10)
The boundedness of {xk } implies that there exist an accumulation point xˆ and an infinite index set K1 such that limk∈K1 xk = xˆ . It follows from (3.9) that the sequence {dk }k∈K1 is also bounded. Therefore there exist an infinite index set K2 ⊂ K1 and an
ˆ Taking the limit in (3.10) for k ∈ K2 , we obtain accumulation point dˆ such that limk∈K2 dk = d. −hF (ˆx), dˆ i ≤ 0. However, it is easy to see from (2.6) that
−hF (ˆx), dˆ i > 0. This yields a contradiction. Hence, we have limk→∞ inf kFk k = 0. By the continuity of F and the boundedness of {xk }, it is clear that the sequence {xk } has some accumulation point b x such that F (b x) = 0. From (3.6), it holds that the sequence {kxk − b xk} is convergent, and since b x is an accumulation point of {xk }, it must hold that {xk } converges to b x. The proof is complete. 4. Numerical results In this section, we tested Algorithm 2.1 and compared it with the spectral method in [14]. We implemented Algorithm 2.1 with the following parameters: β = 0.6, ξ = 10−8 and η = 10−4 . The initial steplength in Step 2 of Algorithm 2.1 is set to be the spectral coefficient
σk+1 =
sTk sk sTk yk
,
where sk = xk+1 − xk and yk = F (xk+1 ) − F (xk ). By the monotonicity and the Lipschitz continuity of F , it is not difficult to show that 0 ≤ yTk sk ≤ LsTk sk , where L is the Lipschitz constant. If σk 6∈ [σmin , σmax ], we replace the spectral coefficient by
1 σk = kF (xk )k−1 5 10
if kF (xk )k > 1, if 10−5 ≤ kF (xk )k ≤ 1, if kF (xk )k < 10−5 ,
where σmin = 10−10 and σmax = 1010 . This parabolic model is same as the one described in [13]. We stop the iteration if the iteration number exceeds 1000 or the inequality
kF (xk )k ≤ 10−5 or kF (zk )k ≤ 10−5 is satisfied. The spectral method in [14] was implemented with the following parameters: β = 0.4, σ = 0.01 and r = 0.001. The stop criterion is kF (xk )k ≤ 10−5 or the iteration number exceeds 1000. The codes were written in FORTRAN 90 with double precision arithmetic and carried out on a PC (CPU 3.0 GHz, 512 M memory) with Windows operating system. According to an anonymous referee’s suggestion, we pay attention to the performance of the two algorithms on linear monotone systems of equations. We test the following problems with various dimensions and different initial points.
W. Cheng / Mathematical and Computer Modelling 50 (2009) 15–20
19
Table 4.1 Test results for Problem 1. Method
Method 1
Method 2
Dim
Initial
Iter
Fun
Time
Iter
Fun
Time
(100)
x0 x1 x2 x3 x4 x5 x6 x7
24 25 24 21 22 22 20 23
54 56 53 47 49 50 44 51
0.6094D−03 0.6250D−03 0.5937D−03 0.5469D−03 0.5469D−03 0.5469D−03 0.5000D−03 0.5625D−03
42 44 37 33 38 36 30 37
128 134 113 100 117 110 90 113
0.1031D−02 0.1063D−02 0.9531D−03 0.8125D−03 0.9531D−03 0.8906D−03 0.7344D−03 0.9063D−03
(1000)
x0 x1 x2 x3 x4 x5 x6 x7
27 26 24 22 24 21 21 23
59 57 54 49 55 47 47 51
0.6578D−02 0.6328D−02 0.5984D−02 0.5437D−02 0.6063D−02 0.5188D−02 0.5250D−02 0.5672D−02
44 44 39 35 38 37 37 40
134 134 119 106 116 113 113 122
0.1059D−01 0.1058D−01 0.9391D−02 0.8391D−02 0.9172D−02 0.8922D−02 0.8922D−02 0.9687D−02
(5000)
x0 x1 x2 x3 x4 x5 x6 x7
30 30 28 25 24 25 22 27
66 66 62 56 56 57 49 60
0.3695D−01 0.3670D−01 0.3436D−01 0.3094D−01 0.3055D−01 0.3136D−01 0.2752D−01 0.3402D−01
46 47 41 36 39 39 35 41
140 144 125 109 119 119 106 125
0.5572D−01 0.5784D−01 0.5016D−01 0.4314D−01 0.4863D−01 0.5023D−01 0.4295D−01 0.5017D−01
Problem 1. Function F is given by F (x) = A1 x + b1 , where b1 = (−1, −1, −1, . . . , −1)T and
5 2 1 A1 =
1 5 2
..
.
1
.. ..
. .
.. ..
. .
1
. 1 5 2
Problem 2. Function F is given by F (x) = A2 x + b2 , where b2 = (−1, −2, . . . , −n)T and
5 2 A2 =
3 5
3
..
..
.
..
. .
.. ..
. .
2
.
3 5
We note that Problem 1 is symmetric while Problem 2 is nonsymmetric. The results are listed in Tables 4.1 and 4.2 where x0 = (10, 10, . . . 10)T , x1 = (−10, −10, . . . , −10)T , x2 = (1, 1, . . . , 1)T , x3 = (−1, −1, ..., −1)T , x4 = (1, 12 , 13 , . . . , 1n ),
x5 = (0.1, 0.1, . . . , 0.1)T , x6 = ( 1n , 2n , · · · , 1)T and x7 = (1 − 1n , 1 − 2n , . . . , 0)T . In Tables 4.1 and 4.2, we report the dimension of each test problem (dim), the number of iterations (iter), the number of function evaluations (fun) and the CPU time in seconds (time). According to an anonymous referee’s suggestion, we tested each problem 1000 times with the same initial point. The CPU time reported in Tables 4.1 and 4.2 is the average value. In the tables, ‘‘method 1’’ and ‘‘method 2’’ represent Algorithm 2.1 and the spectral method in [14] respectively.
20
W. Cheng / Mathematical and Computer Modelling 50 (2009) 15–20
Table 4.2 Test results for Problem 2. Method
Method 1
Dim
Initial
(100)
Method 2
Iter
Fun
Time
Iter
Fun
Time
x0 x1 x2 x3 x4 x5 x6 x7
70 64 69 51 70 74 69 36
290 285 312 205 320 339 317 118
0.2641D−02 0.2547D−02 0.2734D−02 0.1875D−02 0.2812D−02 0.2938D−02 0.2766D−02 0.1125D−02
118 113 118 113 114 114 123 119
429 407 426 414 416 414 440 424
0.3266D−02 0.3125D−02 0.3203D−02 0.3078D−02 0.3141D−02 0.3187D−02 0.3344D−02 0.3250D−02
(1000)
x0 x1 x2 x3 x4 x5 x6 x7
72 75 78 77 76 73 78 70
313 328 337 336 329 313 339 300
0.2739D−01 0.2875D−01 0.2937D−01 0.2930D−01 0.2886D−01 0.2792D−01 0.2980D−01 0.2636D−01
138 136 140 139 141 135 137 139
504 503 506 508 511 497 503 507
0.3852D−01 0.3777D−01 0.3819D−01 0.3919D−01 0.3925D−01 0.3756D−01 0.4028D−01 0.3892D−01
(5000)
x0 x1 x2 x3 x4 x5 x6 x7
111 106 111 105 117 118 103 107
529 501 532 507 556 558 489 508
0.2317D+00 0.2149D+00 0.2290D+00 0.2167D+00 0.2385D+00 0.2401D+00 0.2105D+00 0.2178D+00
159 159 156 156 164 154 160 154
574 579 574 567 591 568 583 561
0.2188D+00 0.2188D+00 0.2175D+00 0.2144D+00 0.2271D+00 0.2124D+00 0.2228D+00 0.2156D+00
From Tables 4.1 and 4.2, we can see that method1 performs better than method2. So the proposed method is promising. Simultaneously, during the numerical experiments, we found that the step dk = −F (xk ) never appeared when k > 0. In other words, the condition dTk Fk ≤ −ξ kFk k2 was always satisfied. Acknowledgments The author is grateful to the two anonymous referees for their helpful suggestions and comments. The author also thanks Professor Dong-Hui Li for his valuable suggestions and the revision of the paper. References [1] A.N. Iusem, M.V. Solodov, Newton-type methods with generalized distances for constrained optimization, Optimization 41 (1997) 257–278. [2] Y.B. Zhao, D. Li, Monotonicity of fixed point and normal mapping associated with variational inequality and its application, SIAM J. Optim. 4 (2001) 962–973. [3] P.N. Brown, Y. Saad, Convergence theory of nonlinear Newton–Krylov algorithms, SIAM J. Optim. 4 (1994) 297–330. [4] C.G. Broyden, A class of methods for solving nonlinear simultaneous equations, Math. Comput. 19 (1965) 577–593. [5] M. Gasparo, A nonmonotone hybrid method for nonlinear systems, Optim. Methods Softw. 13 (2000) 79–94. [6] A. Griewank, The global convergence of Broyden-like methods with suitable line search, J. Austral. Math. Soc. Ser., B 28 (1986) 75–92. [7] D.H. Li, M. Fukushima, A globally and superlinear convergent Gauss-Newton based BFGS methods for symmetric nonlinear equations, SIAM J. Numer. Anal. 37 (1999) 152–172. [8] D.H. Li, M. Fukushima, A derivative-free line search and global convergence of Broyden-like method for nonlinear equations, Optim. Methods Softw. 13 (2000) 583–599. [9] J.M. Martínez, A family of quasi-Newton methods for nonlinear equations with direct secant updates of matrix factorizations, SIAM J. Numer. Anal. 27 (1990) 1034–1049. [10] J. Barzilai, J.M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal. 8 (1988) 141–148. [11] W. La Cruz, M. Raydan, Nonmonotone spectral methods for large-scale nonlinear systems, Optim. Methods Softw. 18 (2003) 583–599. [12] L. Grippo, F. Lampariello, S. Lucidi, A nonmonotone line search technique for Newton’s method, SIAM J. Numer. Anal. 23 (1986) 707–716. [13] W. La Cruz, J.M. Martínez, M. Raydan, Spectral residual method without gradient information for solving large-scale nonlinear systems of equations, Math. Comput. 75 (2006) 1429–1448. [14] L. Zhang, W. Zhou, Spectral gradient projection method for solving nonlinear monotone equations, J. Comput. Appl. Math. 196 (2006) 478–484. [15] M.V. Solodov, B.F. Svaiter, A globally convergent inexact Newton method for systems of monotone equations, in: M. Fukushima, L. Qi (Eds.), Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, Kluwer Academic Publishers, 1998, pp. 355–369. [16] D.H. Li, W.Y. Cheng, Recent progress in the global convergence of quasi-Newton methods for nonlinear equations, Hokkaido Math. J. 36 (2007) 729–743. [17] J.M. Martínez, Practical quasi-Newton methods for solving nonlinear systems, J. Comput. Appl. Math. 124 (2000) 97–121. [18] W.W. Hager, H. Zhang, A survey of nonlinear conjugate gradient methods, Pacific J. Optim. 2 (2006) 35–58. [19] E. Polak, G. Ribière, Note sur la convergence de directions conjugées, Rev. Francaise Imformat Recherche Opertionelle 16 (1969) 35–43. [20] B.T. Polyak, The conjugate gradient method in extreme problems, USSR Comp. Math. Math. Phys. 9 (1969) 94–112.