Convergence properties of a class of nonlinear conjugate gradient methods

Computers & Operations Research 40 (2013) 2656–2661 Contents lists available at SciVerse ScienceDirect Computers & Operations Research journal homep...

Download PDF

289KB Sizes 3 Downloads 154 Views

Report

PDF Reader
Full Text

Computers & Operations Research 40 (2013) 2656–2661

Contents lists available at SciVerse ScienceDirect

Computers & Operations Research journal homepage: www.elsevier.com/locate/caor

Convergence properties of a class of nonlinear conjugate gradient methods Jinkui Liu n,1 School of Mathematics and Statistics, Chongqing Three Gorges University, Chongqing, China

art ic l e i nf o

a b s t r a c t

Available online 1 June 2013

Conjugate gradient methods are a class of important methods for unconstrained optimization problems, especially when the dimension is large. In this paper, we study a class of modiﬁed conjugate gradient methods based on the famous LS conjugate gradient method, which produces a sufﬁcient descent direction at each iteration and converges globally provided that the line search satisﬁes the strong Wolfe condition. At the same time, a new speciﬁc nonlinear conjugate gradient method is constructed. Our numerical results show that the new method is very efﬁcient for the given test problems by comparing with the famous LS method, PRP method and CG-DESCENT method. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Unconstrained optimization Conjugate gradient method Strong Wolfe line search Descent property Convergence property

1. Introduction

Hager–Zhang (HZ) [5] formulas, i.e.,

In the reference [2], Liu and Storey propose the famous LS nonlinear conjugate gradient method (called LS method). The important property of the LS method is that it has good numerical results. But its global convergence has not been thoroughly proved under the Wolfe-type line search condition. The purpose of this paper is to study a class of conjugate gradient methods related to the famous LS method. Consider the following unconstrained optimization problem:

βFR k ¼

minn f ðxÞ; x∈R

where f ðxÞ is smooth and its gradient gðxÞ is available. Conjugate gradients methods are efﬁcient to solve the above problems, and have the following iteration form: xkþ1 ¼ xk þ αk dk ; ( dk ¼

ð1:1Þ

−g k ;

for k ¼ 1;

−g k þ βk dk−1 ;

for k≥2;

ð1:2Þ

where g k ¼ −∇f ðxk Þ, αk 4 0 is a step length determined by some line search; dk is the search direction and βk is a scalar. The formula of βk should be so chosen that the method reduces to the linear conjugate gradient method in some case when f ðxÞ is strictly convex and the line search is exact. Some well-known formulas of βk are called the Fletcher–Reeves (FR) [1], Liu–Story (LS) [2], Polak–Ribière–Polyak (PRP) [3,4] and n

Tel.: +86 13658363032. E-mail address: [email protected] 1 This work was supported by The Nature Science Foundation of Chongqing Education Committee (KJ121112). 0305-0548/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cor.2013.05.013

βHZ k

gTk yk−1 gT y jjg k jj2 ; ½1; βLS ; ½2; βPRP ¼ k k−12 ; ½3; 4; k ¼− T k jjg k−1 jj2 jjg dk−1 gk−1 k−1 jj !T jjy jj2 gk ¼ yk−1 −2dk−1 T k−1 ; ½5; T dk−1 yk−1 dk−1 yk−1

where ||⋅|| is the Euclidean norm and yk−1 ¼ g k −g k−1 . Their corresponding conjugate gradient methods were generally speciﬁed as FR, LS, PRP and HZ methods. Obviously, if f is a strictly convex quadratic function and the line search is exact, the above methods are equivalent. In the past few years, the LS and PRP methods have been regarded as the most efﬁcient conjugate gradient method in practical computation, which makes them research widely, see [2–4,6–11]. Hager and Zhang [5] discussed the global convergence of the HZ method for strong convex functions under the Wolfe line search, i.e., αk satisﬁes f ðxk þ αk dk Þ ≤f ðxk Þ þ δαk g Tk dk ;

ð1:3Þ

gðxk þ αk dk ÞT dk ≥sg Tk dk ;

ð1:4Þ

where 0 o δ os o1. In order to prove the global convergence for general functions, Hager and Zhang modiﬁed the parameter βHZ k as ¼ maxfβHZ βMHZ k k ; ηk g

ð1:5Þ

where ηk ¼ ð−1=jjdk−1 jjminfη; jjg k−1 jjgÞ, η 4 0. The corresponding method of (1.5) is the famous CG-DESCENT method. Gilbert and Nocedal [12] investigate global convergence prop erties of the dependent FR method with βk satisfying βk ≤ βFR k , provided that the line search satisﬁes the strong Wolfe conditions,

J. Liu / Computers & Operations Research 40 (2013) 2656–2661

i.e., αk satisﬁes (1.3) and T

jgðxk þ αk dk Þ

dk j ≤−sg Tk dk ;

ð1:6Þ

2657

We state a general convergence result as follows. This result was essentially proved by Zoutendijk [13]. It is important in the convergence analyses of nonlinear optimization methods.

where 0 o δ o s o 1. The above observation motivates us to construct a class of conjugate gradient methods in which βk satisﬁes ð1:7Þ βk ≤t k βLS k ;

Lemma 2.2. Suppose Assumption (H) holds. Consider any method (1.1)–(1.2), where dk satisﬁes g Tk dk o 0 for k∈N þ and αk satisﬁes the Wolfe line search (1.3) and (1.4). Then

where

k≥1

u ; sð1 þ μk Þ

tk ¼

μk ¼

jg Tk gk−1 j 2

jjg k jj

;

0ouo

∑

ðgTk dk Þ2 jjdk jj2

o þ ∞:

ð2:4Þ

The strong Wolfe line search is a special case of the Wolfe line search, so the Lemma 2.2 also holds under the strong Wolfe line search.

1 and s∈ð0; 1Þ 2

In Section 2, we will investigate global convergence properties of the new modiﬁed conjugate gradient method with the strong Wolfe line search. In Section 3, we will give a speciﬁc nonlinear conjugate gradient method which originates in the new modiﬁed conjugate gradient method, and some numerical results are also reported.

Lemma 2.3. Consider any method (1.1)–(1.2), where βk satisﬁes (1.7) and αk satisﬁes the strong Wolfe line search (1.3) and (1.6). If there exists a constant r 40, for ∀k≥1 such that ‖g k ‖≥r:

ð2:5Þ

Then we have ∑ jjuk −uk−1 jj2 o þ ∞: k≥2

2. The main results In this section, we always assume that jjg k jj≠0 for all k, for otherwise a stationary point has already been found. At the same time, in order to guarantee the global convergence of the new method, we make the following assumption on the objective function f ðxÞ. Assumption (H).

Obviously, from Assumption (H), we know that there exists a constant r~ 4 0, such that jjgðxÞjj ≤ r~ for all x∈V: ð2:1Þ

Lemma 2.1. Consider any method (1.1)–(1.2), where βk satisﬁes (1.7) and αk satisﬁes the strong Wolfe line search (1.3) and (1.6). Then ð2:2Þ

Proof. We prove the conclusion by induction. Since jjg 1 jj2 ¼ −g T1 d1 and u∈ð0; 1=2Þ, the conclusion (2.2) holds for k ¼ 1. Now we assume that the conclusion is true for k−1 and g k ≠0, then g Tk−1 dk−1 o 0. We need to prove that the result holds for k. Multiplying (1.2) by g Tk , we have gTk dk

2

¼ −jjgk jj þ

From above inequality, the conclusion (2.2) also holds. According to (2.2) and 0 o u o 12, we also have jjgk jj

Proof. From (2.2), we know that dk ≠0, ∀k∈N þ . Deﬁne the quantities −g k β jjd jj and δk ¼ k k−1 jjdk jj jjdk jj

In the following, we ﬁrst prove that 1 þ δk ≠0 holds. Obviously, the inequality holds when the parameter βk ≥0. In order to prove that the inequality also holds when βk o0, we apply the contradiction. Suppose that 1 þ δk ¼ 0 holds, i.e., −βk jjdk−1 jj ¼ jjdk jj. From βk o 0, then we have jjβk dk−1 jj ¼ jjdk jj From (1.2), we have jjdk þ g k jj ¼ jjβk dk−1 jj, therefore, jjdk þ g k jj ¼ jjdk jj Squaring the equality, we get jjg k jj2 ¼ −2g Tk dk ; which is contradictive with (2.3). Hence, 1 þ δk ≠0 always holds when βk satisﬁes (1.7) under the strong Wolfe line search, i.e., there exists a constant ρ 4 0 such that j1 þ δk j≥ρ:

ð2:6Þ

From (1.2), we have uk ¼

dk −g k þ βk dk−1 ¼ ¼ r k þ δk uk−1 : jjdk jj jjdk jj

jjr k jj ¼ jjuk −δk uk−1 jj ¼ jjδk uk −uk−1 jj:

≤−ð1−uÞjjg k jj2 :

o −2g Tk dk :

dk : jjdk jj

According to the deﬁnition of uk , we have jjuk jj ¼ 1, then

βk gTk dk−1 :

Then from (1.7) and (1.6), we get 2 T gTk dk ≤−gk þ βk ⋅jg Tk dk−1 ≤−gk j2 þ t k ⋅βLS k ⋅jg k dk−1 j 2 jjg jj2 þ jg Tk g k−1 j ujjg k jj2 ⋅ k T ≤−gk þ ⋅ð−sgTk−1 dk−1 Þ 2 T −gk−1 dk−1 sðjjg k jj þ jg k g k−1 jÞ

2

uk ¼

rk ¼

(i) The level set Ω¼{x∈Rn jf ðxÞ ≤f ðx1 Þ} is bounded, where x1 is the starting point. (ii) In some neighborhood Vof Ω, f is differentiable and its gradient g is Lipchitz continuous, namely, there exists a constant L 4 0 such that jjgðxÞ−gðyÞjj ≤Ljjx−yjj; for all x; y∈V:

g Tk dk ≤−ð1−uÞjjgk jj2 :

where

By (2.6) and (2.7), we have 1 jjð1 þ δk Þ⋅ðuk −uk−1 Þjj j1 þ δk j 1 ðuk −δk uk−1 þδk uk −uk−1 Þ ≤ j1 þ δk j 2 ≤ r k jj: ρ

jjuk −uk−1 jj ¼

From (2.2), (2.4), (2.5) and (2.8), we have ð2:3Þ

ð2:7Þ

ρ2 r 2 ð1−uÞ2 ∑ jjuk −uk−1 jj2 ≤ð1−uÞ2 ∑ ðjjgk jj2 ⋅r 2k Þ 4 k≥1 k≥1

ð2:8Þ

2658

J. Liu / Computers & Operations Research 40 (2013) 2656–2661

ðgT dk Þ2 jjgk jj4 ≤ ∑ k 2 o þ ∞: 2 k≥1 jjdk jj k≥1 jjdk jj

¼ ð1−uÞ2 ∑

know that there exists k0 such that ∑ jjuiþ1 −ui jj2 ≤ i≥k0

Then, the conclusion of Lemma 2.3 holds. Lemma 2.4. Suppose Assumption (H) holds. If (2.5) holds, then the parameter (1.7) has property (n) under the strong Wolfe line search, i.e.,

1 : 4Δ

ð2:12Þ

From the Cauchy–Schwartz inequality and (2.12), ∀i∈ k; k þ Δ−1 , we have !1=2 i−1

(i) There exists a constant b 4 1, such that jβk j ≤b; 1 (ii) There exists a constant λ 4 0; such that jjxk −xk−1 jj ≤λ⇒jβk j ≤ 2b . Proof. From Assumption (H), we know (2.1) holds. By (1.7), (2.1), (2.2) and (2.5), we have jjg jj2 þ jg Tk g k−1 j ujjg k jj2 r~ 2 ⋅ k ≤ ¼ b: βk ≤t k βLS k ≤ 2 2 T sð1−uÞr 2 sðjjg k jj þ jg k g k−1 jÞ ð1−uÞjjg k−1 jj . If jjxk −xk−1 jj ≤λ, from (1.7), (2.1), (2.2), (2.5) Deﬁne λ ¼ sð1−uÞr 2Lr~ b and Assumption (H)(ii),we have u jjg jj⋅jjg k −g k−1 jj Lλr~ 1 : ⋅ k ≤ ¼ βk ≤t k βLS k ≤ sð1 þ t k Þ ð1−uÞjjg k−1 jj2 2b sð1−uÞr 2 2

Lemma 2.5. Suppose Assumption (H) holds. Consider any method (1.1)–(1.2), where βk satisﬁes (1.7), and αk satisﬁes the strong Wolfe line search. If (2.5) holds, then there exits λ 4 0, for any

1

jjui−1 −uk−1 jj ≤ ∑ jjuj −uj−1 jj ≤ ði−kÞ2 j¼k

12 1 1 1 ≤Δ2 ⋅ ¼ : 4Δ 2

i−1

∑ jjuj −uj−1 jj2

j¼k

ð2:13Þ

By Lemma 2.5, we know that there exists k≥k0 such that λ Δ ð2:14Þ ℜk;Δ 4 : 2 It follows from (2.11), (2.13) and (2.14) that 1 kþΔ−1 λΔ λ o ℜλk;Δ o ∑ jjsi−1 jj ≤2υ: 4 2 2 i¼k From above the inequality, we have Δ o 8υ=λ, which is a contradiction with the deﬁnition ofΔ. Hence, lim inf jjg k jj ¼ 0 k-þ∞

3. Numerical results

Δ∈Z þ and k0 ∈Z þ , and for all k≥k0 , such that λ Δ ℜk;Δ 4 ; 2 where ℜλk;Δ Δ fi∈Z þ : k ≤i ≤k þ Δ−1; jjxi −xi−1 jj≥λg, jℜλk;Δ j denotes the number of elements in ℜλk;Δ :

In this section, we give a speciﬁc algorithm which originates in the new modiﬁed conjugate gradient method as follows: New method:

Theorem 2.1. Suppose Assumption (H) holds. Consider any method (1.1)–(1.2), in which αk satisﬁes the strong Wolfe line search and βk satisﬁes (1.7). Then

Step 1: Set x1 ∈Rn ,ε≥0, d1 ¼ −g 1 , if ||g 1 || ≤ε, then stop. Step 2: Compute αk by some line search. Step 3: Let xkþ1 ¼ xk þ αk dk ,g kþ1 ¼ gðxkþ1 Þ, if ||g kþ1 || ≤ε, then stop. Step 4: Generate dkþ1 by (1.2) and compute βkþ1 by ( LS βk ; if t k ≥1; βk ¼ t k βLS k ; else;

lim inf jjg k jj ¼ 0

where

Proof. Obviously, Lemma 2.1, Lemma 2.2 and Lemma 2.4 all hold. Then we can prove Lemma 2.5 by Lemma 3.3.2 [14].

ð2:9Þ

k-þ∞

Proof. We can obtain the result by contradiction. Suppose that (2.5) holds, then Lemma 2.3, Lemma 2.4 and Lemma 2.5 all hold. Deﬁne the quantities: uk ¼ For ∀l; k∈Z þ ðl≥kÞ, we have l

dk jjdk jj

l

i¼k

l

þ ∑ jjsi−1 jjðui−1 −uk−1 Þ; i¼k

i.e., l

l

i¼k

i¼k

∑ jjsi−1 jj⋅uk−1 ¼ ðxl −xk−1 Þ− ∑ jjsi−1 jjðui−1 −uk−1 Þ

ð2:10Þ

According to Assumption (H) (i), we know that there exists a constant υ4 0 such that jjxjj ≤υ;

for ∀x∈V:

From (2.10), we have l

l

i¼k

i¼k

∑ jjsi−1 jj ≤2υ þ ∑ jjsi−1 jj⋅jjui−1 −uk−1 jj:

u ; sð1 þ μk Þ

μk ¼

jg Tk gk−1 j jjg k jj2

;

u ¼ 0:45

Step 5: Set k ¼ k þ 1, go to step 2.

and si−1 ¼ xi −xi−1 .

xl −xk−1 ¼ ∑ jjxi −xi−1 jj⋅ui−1 ¼ ∑ jjsi−1 jj⋅uk−1 i¼k

tk ¼

ð2:11Þ

Let Δ be a positive integer and Δ∈ 8υ=λ; 8υ=λ þ 1 where λ has been deﬁned in Lemma 2.5 From the result of Lemma 2.3, we

We compare the performance of the new method with that of LS method, PRP method and CG-DESCENT method under the strong Wolfe line search. The test problems come from the CUTEr library. For the sake of the fairness, we use the code (CG_DESCENTC-6.5) downloaded from the website http://www.math.uﬂ.edu/ hager/, and the parameters in the line search are given in the code. The iteration is stopped if the inequity jjg k jj∞ ≤10−6 is satisﬁed. All codes run on a PC with 2.0 GHz CPU processor and 512 MB memory and Windows XP operation system. The numerical results are reported in Table 1. The column “Problem” represents the problem’s name. “Dim” denotes the dimension of the test problems. The detailed numerical results are listed in the form NI/CPU, where NI and CPU denote the number of iterations and CPU time, respectively. We say that, in the particular the ith problem, the performance of one method was better than the performance of other methods, if the CPU time, or the number of iterations of one method was less than the CPU time, or the number of iterations of other methods, respectively. In this paper, we need evaluate their overall effect, so we adopt the performance proﬁles by Dolan and Moré

J. Liu / Computers & Operations Research 40 (2013) 2656–2661

2659

Table 1 The numerical results of the new method, LS method, PRP method and CG-DESCENT method. Problem

Dim

New

LS

PRP

CG_DESCENT

AKIVA ALLINITU ARGLINA ARGLINB ARWHEAD BARD BDQRTIC BEALE BIGGS6 BOX3 BOX BROWNAL BROWNBS BROWNDEN BROYDN7D BRYBND CHAINWOO CHNROSNB CLIFF COSINE CRAGGLVY CUBE DECONVU DENSCHNA DENSCHNB DENSCHNC DENSCHND DENSCHNE DENSCHNF DIXMAANA DIXMAANB DIXMAAND DIXMAANE DIXMAANF DIXMAANG DIXMAANH DIXMAANI DIXMAANJ DIXMAANK DIXMAANL DQDRTIC DQRTIC EDENSCH EG2 EIGENBLS ENGVAL1 ENGVAL2 ERRINROS EXPFIT EXTROSNB FLETCHCR FMINSRF2 FMINSURF FREUROTH GENHUMPS GENROSE GROWTHLS GULF HAIRY HATFLDD HATFLDE HATFLDFL HEART6LS HEART8LS HELIX HIELOW HILBERTB HIMMELBB HIMMELBF HIMMELBH HUMPS JENSMP JIMACK KOWOSB LIARWHD

2 4 200 200 5000 3 5000 2 6 3 10,000 200 2 4 5000 5000 4000 50 2 10,000 5000 2 61 2 2 2 3 3 2 3000 3000 3000 3000 3000 3000 3000 3000 3000 15 3000 5000 5000 2000 1000 2550 5000 3 50 2 1000 1000 5625 5625 5000 5000 500 3 3 2 3 3 3 6 8 3 3 10 2 4 2 2 2 3549 4 5000

16/0.002 11/0.001 1/0.002 5/0.004 14/0.021 10/0.000 122/0.189 13/0.001 22/0.000 12/0.000 19/0.067 13/0.003 24/0.001 15/0.001 1173/2.762 75/0.053 208/0.156 196/0.002 16/0.000 13/0.027 86/0.099 41/0.001 315/0.009 5/0.000 5/0.000 13/0.000 62/0.000 19/0.000 10/0.000 9/0.003 8/0.002 8/0.002 261/0.116 185/0.098 325/0.100 234/0.071 3468/1.812 395/0.165 62/0.001 196/0.023 11/0.009 18/0.003 20/0.007 8/0.002 1215/125.039 36/0.047 33/0.001 361/0.003 8/0.000 2716/0.628 140/0.021 392/0.216 350/0.511 34/0.050 12316/31.857 845/0.087 126/0.001 29/0.003 31/0.003 25/0.001 36/0.001 21/0.000 449/0.002 216/0.002 39/0.001 12/0.010 4/0.001 15/0.001 18/0.001 7/0.000 66/0.001 15/0.000 7120/1032.581 17/0.000 26/0.025

53/0.003 14/0.001 1/0.004 5/0.003 33/0.054 10/0.000 164/0.198 19/0.001 24/0.000 11/0.000 39/0.096 13/0.006 26/0.001 12/0.001 1173/2.988 62/0.043 208/0.130 562/0.015 14/0.000 18/0.039 96/0.120 37/0.001 618/0.018 8/0.000 8/0.000 27/0.000 79/0.000 15/0.000 16/0.000 9/0.004 7/0.001 9/0.003 381/0.312 417/0.364 223/0.202 330/0.0102 4926/2.895 208/0.056 95/0.005 379/0.087 21/0.010 17/0.002 35/0.004 17/0.002 2537/149.165 48/0.057 46/0.002 753/0.011 7/0.000 4162/0.619 387/0.079 492/0.387 512/0.544 62/0.043 20816/52.489 1562/0.108 58/0.001 45/0.007 89/0.004 25/0.001 36/0.001 46/0.000 532/0.004 618/0.006 54/0.001 10/0.007 4/0.001 19/0.001 18/0.001 7/0.000 66/0.003 15/0.000 9648/1240.461 17/0.000 80/0.060

13/0.001 15/0.001 1/0.004 5/0.007 21/0.040 10/0.000 185/0.265 11/0.001 27/0.000 5/0.000 24/0.087 13/0.005 17/0.000 16/0.001 2314/4.280 41/0.034 238/0.165 375/0.009 23/0.000 15/0.048 127/0.187 25/0.001 526/0.010 8/0.000 5/0.000 15/0.000 65/0.000 12/0.000 13/0.000 9/0.003 12/0.001 7/0.003 484/0.326 392/0.241 428/0.123 249/0.081 6574/2.967 268/0.089 42/0.001 168/0.018 16/0.009 12/0.002 38/0.009 19/0.003 3641/121.894 21/0.050 32/0.001 864/0.010 12/0.000 3164/0.489 237/0.042 671/0.425 735/0.814 22/0.036 16708/31.246 3491/0.520 120/0.001 38/0.006 29/0.002 28/0.001 41/0.001 26/0.000 481/0.003 357/0.003 25/0.000 15/0.009 4/0.001 8/0.001 27/0.001 7/0.000 84/0.002 15/0.000 12713/1140.413 15/0.000 41/0.027

15/0.001 12/0.001 1/0.003 4/0.006 10/0.010 12/0.000 136/0.173 12/0.001 27/0.000 9/0.000 13/0.045 17/0.007 11/0.000 19/0.001 1598/3.004 63/0.056 318/0.228 287/0.002 15/0.000 16/0.051 100/0.146 34/0.001 481/0.007 7/0.000 5/0.000 10/0.000 51/0.000 16/0.000 11/0.000 9/0.003 7/0.001 7/0.003 235/0.114 172/0.064 139/0.041 189/0.060 3962/1.669 320/0.100 49/0.001 239/0.077 8/0.007 14/0.005 26/0.009 6/0.001 1423/110.748 27/0.030 26/0.000 393/0.003 10/0.000 3808/0.562 164/0.028 371/0.268 473/0.512 25/0.045 8612/16.512 1316/0.101 143/0.001 43/0.005 36/0.003 20/0.000 30/0.000 39/0.000 684/0.005 249/0.001 23/0.000 16/0.034 4/0.001 10/0.001 35/0.001 7/0.000 53/0.001 15/0.000 8916/956.527 17/0.000 30/0.020

2660

J. Liu / Computers & Operations Research 40 (2013) 2656–2661

Table 1 (continued ) Problem

Dim

New

LS

PRP

CG_DESCENT

LOGHAIRY MANCINO MARATOSB MEXHAT MOREBV MSQRTALS MSQRTBLS NCB20B NCB20 NONCVXU2 NONDQUAR OSBORNEA OSBORNEB PARKCH PENALTY1 PENALTY2 PENALTY3 POWELLSG POWER QUARTC ROSENBR S308 SCHMVETT SENSORS SINEVAL SINQUAD SNAIL SPARSQUR SPMSRTLS SROSENBR STRATEC TESTQUAD TOINTGOR TOINTPSP TOINTQOR TRIDIA VARDIM VAREIGVL VIBRBEAM WATSON WOODS YFITU

2 100 2 2 5000 1024 1024 5000 5010 5000 5000 5 11 15 1000 200 200 5000 10,000 5000 2 2 5000 100 2 5000 2 10,000 4999 5000 10 5000 50 50 50 5000 200 50 8 12 4000 3

20/0.002 18/0.085 749/0.005 31/0.000 184/0.356 2851/3.186 5108/7.988 2480/42.861 3745/56.418 6631/8.126 1981/0.906 88/0.004 49/0.005 592/25.498 31/0.005 94/0.015 76/0.714 55/0.050 329/0.198 42/0.018 10/0.000 17/0.000 28/0.033 37/0.182 71/0.010 23/0.045 41/0.000 42/0.125 307/0.517 8/0.005 459/5.236 1167/0.962 153/0.020 149/0.007 45/0.002 916/0.818 39/0.001 22/0.001 108/0.005 43/0.001 62/0.012 67/0.001

37/0.009 26/0.081 749/0.006 31/0.000 189/0.281 3781/4.156 5018/8.151 2480/52.156 3155/49.115 9161/13.495 3624/1.156 88/0.005 49/0.009 687/25.485 80/0.015 92/0.011 148/0.815 55/0.061 461/0.209 51/0.032 13/0.000 15/0.001 40/0.039 23/0.126 64/0.012 56/0.069 48/0.000 286/0.208 378/0.548 8/0.008 518/5.894 2167/0.916 93/0.010 378/0.019 88/0.003 916/0.916 25/0.001 29/0.001 176/0.011 58/0.001 27/0.009 67/0.003

48/0.008 37/0.091 864/0.006 17/0.000 213/0.351 2016/3.027 4610/6.294 5219/90.411 2306/44.561 11203/15.421 1985/0.628 72/0.004 116/0.010 528/22.810 127/0.023 110/0.017 62/0.518 34/0.031 673/0.225 35/0.027 21/0.000 17/0.001 36/0.043 18/0.107 95/0.014 38/0.056 66/0.000 274/0.138 461/0.648 8/0.009 372/6.124 2746/0.830 114/0.012 186/0.010 92/0.004 1370/0.819 32/0.002 16/0.001 143/0.008 79/0.002 16/0.019 106/0.002

27/0.001 11/0.078 1265/0.004 20/0.000 161/0.245 3160/4.852 2856/3.754 3089/45.378 4437/50.164 7386/8.724 2178/0.876 96/0.003 62/0.005 765/22.278 35/0.004 191/0.037 99/0.836 26/0.027 462/0.275 17/0.010 34/0.000 8/0.000 48/0.097 21/0.245 60/0.001 17/0.038 97/0.000 28/0.083 203/0.354 11/0.012 176/4.014 1867/0.504 135/0.011 143/0.006 29/0.001 817/0.419 10/0.001 23/0.001 153/0.009 51/0.001 27/0.030 84/0.001

1 New method LS method PRP method CG-DESCENT method

0.9 0.8 0.7 0.6 Y

[15] to compare the new method with LS method, PRP method and CG-DESCENT method in the performance of CPU time and the numbers of iteration, respectively. However, some CPU times are zero. So, we do the following provision. We take the average value of the CPU time for each method, and denote as avðnewÞ, avðLSÞ,avðPRPÞ and avðCG−DESCENTÞ. Then CPU time of each problem plus the average value of avðnewÞ, avðLSÞ,avðPRPÞ and avðCG−DESCENTÞ. Fig. 1 is drawn by the performance proﬁles [15] and it shows the performance proﬁles with respect to CPU time means that for each method. We plot the fraction P of problems for which the method is within a factor τ of the best time. The left side of the ﬁgure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that are successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor τ of the best time. Using the same method, we also test on the number of the iterations, see Fig. 2. Obviously, Figs. 1 and 2 show that the new method performs better than the other methods in the performance of CPU time and similarly with the CG-DESCENT method in the performance of the numbers of iteration. And the new method is superior to the LS method and PRP method in the performance of the numbers of iteration. All results show that the efﬁciency of the new

0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5 X

0.6

0.7

0.8

0.9

Fig. 1. Performance proﬁles with respect to the CPU time.

method is encouraging, which also means that the research is meaningful.

J. Liu / Computers & Operations Research 40 (2013) 2656–2661

1 New method LS method PRP method CG-DESCENT method

0.9 0.8 0.7

Y

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

1.5

2

2.5

3

3.5

X Fig. 2. Performance proﬁles with respect to the number of iterations.

Acknowledgments The authors wish to express their heartfelt thanks to the referees and editor for their detailed and helpful suggestions for revising the manuscript. References [1] Fletcher R, Reeves C. Function minimization by conjugate gradients. Computer Journal 1964;7:149–54. [2] Liu Y, Story. C. Efﬁcient generalized conjugate gradient algorithms. Part 1: Journal of Optimization Theory and Applications 1992;69:129–37.

2661

[3] Polak E, Ribire G. Note sur la xonvergence de directions conjugees. Rev Francaise informat Recherche Operatinelle 3e Annee 1969;16:35–43. [4] Polak BT. The conjugate gradient method in extreme problems. USSR Computational Mathematics and Mathematical Physics 1969;9:94–112. [5] Hager WW, Zhang H, A new conjugate gradient method with guaranteed descent and an efﬁcient line search, SIAM Journal on Optimization 2005;16:170–192. [6] Yu Gaohang, Zhao Yanlin, Wei Zengxin. A descent nonlinear conjugate gradient method for large-scale unconstrained optimization. Applied Mathematics and Computation 2007;187:636–43. [7] Powell MJD. Nonconvex minimization calculations and the conjugate gradient method, in Numerical Analysis (Dundee, 1983), vol. 1066 of lecture notes in mathematics, Springer, Berlin, Germany; 1984, p. 122–141. [8] Zhang L, Zhou W, Li D-H. A descent modiﬁed Polak–Ribière–Polyak conjugate gradient method and its global convergence,. IMA Journal of Numerical Analysis 2006;26:629–40. [9] Gilbert JC, Nocedal J. Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on Optimization 1992;2:21–42. [10] Wei Z, Li GY, QI L. Global convergence of the Polak–Ribière–Polyak conjugate gradient method with an Armijo-type inexact line search for nonconvex unconstrained optimization problems. Mathematics of Computation 2008;77:2173–93. [11] Grippo L, Lucidi S. A globally convergent version of Polak–Ribière conjugate gradient method, Mathematical Programming 1997;78:375–391. [12] Gilbert JC, Nocedal J. Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on Optimization 1992;2:21–42. [13] Zoutendijk G. Nonlinear programming, computational methods. In: Abadie J, editor. Integer and Nonlinear Programming. North-Holland; 1970. p. 37–86. [14] Dai Y H, Yuan Y. Nonlinear conjugate gradient method.. Shanghai: Shanghai Scientiﬁc & Technical Publishers; 38–43 (in Chinese). [15] Dolan ED, Moré JJ. Benchmarking optimization software with performance proﬁles. Mathematical Programming 2002;91:201–13.

Convergence properties of a class of nonlinear conjugate gradient methods

Convergence properties of a class of nonlinear conjugate gradient methods

Recommend Documents