The convergence rate of a restart MFR conjugate gradient method with inexact line search

The convergence rate of a restart MFR conjugate gradient method with inexact line search

Applied Mathematics and Computation 218 (2012) 11380–11390 Contents lists available at SciVerse ScienceDirect Applied Mathematics and Computation jo...

332KB Sizes 14 Downloads 139 Views

Applied Mathematics and Computation 218 (2012) 11380–11390

Contents lists available at SciVerse ScienceDirect

Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

The convergence rate of a restart MFR conjugate gradient method with inexact line search q Aiping Qu a,⇑, Donghui Li b, Min Li a a b

Department of Mathematics, Huaihua University, Hunan 418000, China School of Mathematical Sciences, South China Normal University, Guangzhou 510631, China

a r t i c l e

i n f o

a b s t r a c t

Keywords: Unconstrained optimization Restart Fletcher–Reeves conjugate gradient method n-Step quadratic convergence

In this paper, we investigate the convergence rate of the modified Fletcher–Reeves (MFR) method proposed by Zhang et al. [L. Zhang, W.J. Zhou, D.H. Li, Global convergence of a modified Fletcher–Revves conjugate gradient method with Armijo-type line search, Numer. Math. 104 (2006) 561–572.]. Under reasonable conditions, we show that the MFR method with some inexact line search will be n-step superlinearly and even quadratically convergent if some restart technique is used. Some numerical results are also reported to verify the theoretical results. Ó 2012 Elsevier Inc. All rights reserved.

1. Introduction Conjugate gradient methods are quite useful in large scale unconstrained optimization. For a general unconstrained problem

min f ðxÞ;

x 2 Rn ;

ð1:1Þ

where f : Rn ! R is continuously differentiable. The iterates of conjugate gradient methods are obtained by

xkþ1 ¼ xk þ ak dk ;

k ¼ 0; 1; . . . ;

ð1:2Þ

with

dk ¼



if k ¼ 0;

g 0 ;

g k þ bk dk1 ; if k > 0;

ð1:3Þ

where ak is a stepsize and bk is a parameter. Throughout, we use gðxÞ to denote the gradient of f at x and abbreviate gðxk Þ as g k . The Fletcher–Reeves (FR) method proposed by Fletcher and Reeves [1] is a well-known nonlinear conjugate gradient method. In FR method, the parameter bk is specified by

bFR k ¼

kg k k2 kg k1 k2

;

ð1:4Þ

where k  k stands for the Euclidean norm of vectors. Recently, Zhang et al. [2] proposed a modified FR method (MFR). An advantage of the MFR method is that the direction generated by the method is always a descent direction. Under appropriate conditions, the MFR method with inexact line search is globally convergent [2]. q

The work was supported by the NSF of China granted 11071087 and SRF of Huaihua University HHUY2010-04.

⇑ Corresponding author.

E-mail addresses: [email protected] (A. Qu), [email protected] (D. Li), [email protected] (M. Li). 0096-3003/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2012.05.023

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

11381

In this paper, we further study the MFR method. We focused on the convergence rate of the MFR method. The convergence rate of the standard conjugate gradient methods has been well studied. The linear convergence of the standard conjugate gradient method is well-known. Indeed, Crowder and Wolfe [3] gave an example to show that the rate of convergence is exactly linear. Powell [4] showed that if the initial search direction is an arbitrary downhill direction for a convex quadratic function, then either termination occurs or the rate of convergence is only linear. However, if some restart strategy is used, the convergence rate of the FR method can be superlinear/quadratic [5,6]. The simplest restart procedure used in the nonlinear conjugate gradient procedure is to restart the iteration at every r steps by setting br ¼ 0, that is, by taking a steepest descent step at every r iterations. Choen [6] and Burmerster [7] proved the n-step quadratic convergence of the conjugate gradient method with exact line search for general objective function, that is

  kxkrþn  x k ¼ O kxkr  x k2 :

ð1:5Þ

Other restart strategies and related work can be found in [8–12] etc. Quite recently, Li and Tian [13] studied the restart modified Polak–Ribiere–Polyak (MPRP) conjugate gradient method and proved its n-step quadratic convergence if some additional assumptions hold. It should be pointed out that the n-step quadratic convergence of the MPRP method is retained if some inexact line search is used. The purpose of this paper is to investigate the n-step quadratic convergence of the restart MFR method with inexact line search. We will show that under some reasonable conditions, the restarted MFR method with an Armijo type line search is also quadratically convergent. The paper is organized as follows. In Section 2, we introduce a restart strategy in the MFR method and propose a restart MFR method (called RMFR method). In Section 3, we establish the n-step superlinearly/quadratically convergence rate of the RMFR method. In Section 4, we report some numerical results to test the RMFR method. 2. The restart MFR method and its global convergence In this section, we introduce a restart strategy in the MFR method and propose a restart MFR method (called RMFR method). The steps of the method are stated as follows. Algorithm 3.1. (RMFR method) Step 0: Given constants d1 2 ð0; 1=2Þ; 0 6 d2 < mð1=2  d1 Þ;  2 ð0; 1Þ. Given positive sequence fk g converging to zero. Choose an initial point x0 2 Rn , Let k ¼ 0. Step 1: If kg k k 6 , stop. Step 2: Compute

ck ¼

k kg k k2 : þ k dk Þ  g k Þ

ð2:1Þ

T dk ðgðxk

Let ak ¼ maxfjck jqj jj ¼ 0; 1; 2; . . .g satisfying

f ðxk þ ak dk Þ 6 f ðxk Þ þ d1 ak g Tk dk  d2 a2k kdk k2 : Step Step Step Step

3: 4: 5: 6:

ð2:2Þ

Let xkþ1 ¼ xk þ ak dk and k :¼ k þ 1. If kg k k 6 , stop. If k ¼ r, we let x0 :¼ xk . Go to Step 1. Compute dk by

dk ¼ hk g k þ bFR k dk1 ;

ð2:3Þ

where

bFR k ¼

T

kg k k2 2

kg k1 k

;

hk ¼

dk1 yk1 kg k1 k2

;

yk1 ¼ g k  g k1 :

Go to Step 2.

Remark. 1. If we remove Step 5 in the algorithm, then the method reduces to the MFR method in [2]. 2. The scalar ck is an approximation to the exact steplength. It was used in [15,13] as initial steplength. It is easy to see from (2.3) and (2.4) that dk is a descent direction of f at xk . In fact, it satisfies

ð2:4Þ

11382

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390 T

dk g k ¼ kg k k2 :

ð2:5Þ

This implies

kg k k 6 kdk k:

ð2:6Þ

The remainder of this section is devoted to the global convergence of the RMFR method. To this end, we make the the following assumptions. Assumption (A) (H1) The level set X ¼ fx 2 Rn jf ðxÞ 6 f ðx0 Þg is bounded. (H2) f is twice continuously differentiable and uniformly convex, i.e., there are positive constants M P m such that T

mkdk2 6 d r2 f ðxÞd 6 Mkdk2 ;

8x; d 2 Rn ;

where r2 f ðxÞ denotes the Hessian of f at x. It is obvious that under the Assumption (A), the problem (1.1) has a unique solution x which satisfies

1 1 mkx  x k2 6 f ðxÞ  f ðx Þ 6 Mkx  x k2 ; 2 2

8x 2 R n

ð2:7Þ

and

mkxk  x k 6 kg k k 6 Mkxk  x k:

ð2:8Þ

Lemma 2.1. Let Assumption (A) hold and the sequence fxk g be generated by the MFR method, then when k is sufficiently large

ck 6

1 : m

ð2:9Þ

Proof. Denote Bk ¼

ck ¼

R1 0

r2 f ðxk þ ck dk Þdc. By the mean value theorem, we have

k kg k k2 kg k2 kg k k2 1 6 6 : ¼ T k 2 m þ k dk Þ  g k Þ dk Bk dk mkdk k

T dk ðgðxk

ð2:10Þ

The proof is complete. The following theorem shows the global convergence for RMFR method. The proof is similar to Theorem 3.3 in [2]. We omit it. h Theorem 2.1. Let the conditions in Assumptions (A) hold. Then the sequence fxk g generated by the RMFR method converges to the unique solution of problem (1.1).

3. N-step quadratic convergence of the restart MFR method In this section, we prove the n-step quadratic convergence of the RMFR method proposed in the last section. We first prove the following lemmas. Lemma 3.1. Let Assumption (A) hold and the sequence fxk g be generated by the RMFR method. Then there are positive constants C i ; i ¼ 1; 2; 3; 4 such that the following inequalities hold for all k

kg kþ1 k 6 C 1 kdk k; jbFR kþ1 j 6 C 2 ; Proof. Denote c Bk ¼ orem that

ð3:11Þ

jhkþ1 j 6 C 3 ; R1 0

kdkþ1 k 6 C 4 kdk k:

r2 f ðxk þ sak dk Þds and Bk ¼

R1 0

r2 f ðxk þ sk dk Þds. It follows from (2.6), (2.9) and the mean-value the-

      M M  g kþ1  ¼ kg k þ ðg kþ1  g k Þk 6 kg k k þ jak j Bk dk  6 kdk k þ kdk k ¼ 1 þ kdk k , C 1 kdk k: c m m By the definition of bFR k and ck , we have

11383

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

kg kþ1 k2

bFR kþ1 ¼

kg k k2

¼

g Tkþ1 ðg kþ1  g k Þ þ g Tkþ1 g k kg k k2



T

Bk dk g k g k þ ak c Bk dk g Tkþ1 g k ak g Tkþ1 c Bk dk ak g Tkþ1 c Bk dk ak ðg kþ1 þ g k ÞT c ¼ þ ¼ þ ¼ 1 þ : T T T 2 2 kg k k ck dk Bk dk kg k k ck dk Bk dk ck dk Bk dk

Since ak 6 jck j, we derive from (2.6) and (3.11)

    Tc    FR  b  6 1 þ ak ðg kþ1 þ g k Þ Bk dk  6 1 þ jak jðkg kþ1 k þ kg k kÞMkdk k 6 1 þ M kg kþ1 k þ kg k k 6 1 þ M ðC 1 þ 1Þ , C 2 : kþ1 T 2   m kdk k kdk k m jc jmkdk k c d Bk dk k

k k

By the definition of hk , we have

mak kdk k2 2

kg k k

T

6 hkþ1 ¼

dk ðg kþ1  g k Þ 2

kg k k

¼

Bk dk ak dTk c 2

kg k k

6

M ak kdk k2 kg k k2

:

By the line search rule, if ak – ck , then q1 ak does not satisfy inequality (2.2). This implies

f ðxk þ q1 ak dk Þ  f ðxk Þ > d1 ak q1 g Tk dk  d2 q2 a2k kdk k2 :

ð3:12Þ

By the mean-value theorem again, there exist tk 2 ð0; 1Þ such that

f ðxk þ q1 ak dk Þ  f ðxk Þ ¼ q1 ak gðxk þ t k q1 ak dk ÞT dk ¼ q1 ak g Tk dk þ q1 ak ðgðxk þ t k q1 ak dk  g k ÞÞT dk 6 q1 ak g Tkdk þ M q2 a2k kdk k2 : Substituting the last inequality into (3.12), we get

ak kdk k2 P

ðd1  1Þq T ð1  d1 Þq g dk ¼ kg k k2 : M þ d2 k M þ d2

If ak ¼ ck , we obviously have

c1 m 6

mak kdk k2 2

kg k k

ak kdk k2 kg k k2

6 hkþ1 6

ð3:13Þ

1 Þq P M1 . Letting c1 ¼ minf1=M; ð1d g, we get from (3.11) Mþd2

M ak kdk k2 kg k k2

6

M : m

Therefore, we obtain

 M jhk j 6 max jc1 mj; j j , C 3 : m At last, we get by the definition of dk

   jdkþ1 j 6 jhkþ1 jkg kþ1 k þ bFR kþ1 kdk k 6 ðC 3 C 1 þ C 2 Þkdk k , C 4 kdk k: The proof is complete. h Lemma 3.2. Let Assumption (A) hold and the sequence fxk g be generated by the RMFR method. Then it holds that

dk ! 0:

ð3:14Þ

Proof. It follows from Lemma 3.1 that kdkrþi k 6 C i4 kdkr k; i ¼ 0; 1; . . . ; n  1. Since the sequence fxk g generated by Algorithm 2 converges to the unique solution x of (1.1) which satisfies gðx Þ ¼ 0, we obtain fdkr g ! gðx Þ ¼ 0, as k ! 1, This yields the desired conclusion. The following theorem shows the acceptance of ck as a steplength when k is sufficiently large. h Theorem 3.1. Let Assumption (A) hold and the sequence fxk g be generated by the RMFR method. Then when k is sufficiently large, the initial step length ck satisfies the Armijo line search condition. Proof. By the definition of ck , Lemma 3.2 and the fact dk ! 0, we obtain

1 1 T f ðxk þ ck dk Þ ¼ f ðxk Þ þ ck g Tk dk þ c2k dk Bk dk þ c2k oðkdk k2 Þ ¼ f ðxk Þ þ ck g Tk dk þ c2k oðkdk k2 Þ 2 2     1 1 2 2 T 2  d1 ck kg k k þ ck oðkdk k Þ 6 f ðxk Þ þ d1 ck g Tk dk   d1 c2k mkdk k2 þ c2k oðkdk k2 Þ ¼ f ðxk Þ þ d1 ck g k dk  2 2 ¼ f ðxk Þ þ d1 ck g Tk dk  d2 c2k kdk k2  cc2k kdk k2 þ c2k oðkdk k2 Þ;

11384

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

where c ¼ m2  md1  d2 . This implies that the following inequality holds for all k sufficiently large,

f ðxk þ ck dk Þ 6 f ðxk Þ þ d1 ck g Tk dk  d2 c2k kdk k2 : In other words, when k is sufficiently large, the initial steplength ak ¼ ck satisfies the Armijo-type line search condition (2.2). We prepare to show the n-step quadratic convergent of the RMFR method, which requires the following assumptions. Assumption (B) (H3) In some neighborhood N of X; f is three times continuously differentiable. Define a quadratic function

1 fkrd ðxÞ ¼ f ðxkr Þ þ rf ðxkr ÞT ðx  xkr Þ þ ðx  xkr ÞT r2 f ðxkr Þðx  xkr Þ: 2

ð3:15Þ

i Let fxikr g and fdkr g be the iterates and directions generated by the RMFR method to minimize the quadratic function c fkr with initial point x0kr ¼ xkr . Specifically, the sequence fxikr g is generated by the following process:

x0kr ¼ xkr ; ( i dkr

¼

i

iþ1 xkr ¼ xikr þ aikr dkr ;

g 0kr ;

i ¼ 0; 1; . . . ;

if i ¼ 0; i1

hikr g ikr þ bikr dkr ; if i P 1;

where

bikr ¼

kg ikr k2

; i1 2 kg kr k

i1

hikr ¼

ðdkr ÞT yi1 kr 2 kg i1 kr k

;

i1 i1 ykr ¼ g ikr  g kr :

As we have shown in Theorem 3.1, when k is sufficiently large, the steplength ck is always accepted. Since c fkr is a quadratic i1 function, ck is the same as the steplength obtained by the exact line search. Consequently, we have ðg ikr ÞT dkr ¼ 0. Moreover, jðkÞ there is an index jðkÞ 6 n such that xkr is the minimizer of c fkr . In order to prove the n-step quadratic convergence of the RMFR method, it is necessary to first prove several lemmas which are similar to Lemmas (A.1)–(A.10) in [6]. Lemma 3.3. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then

kr2 f ðxkþi Þ  r2 f ðxk Þk ¼ Oðkdk kÞ;

ð3:16Þ

  d  2  Bkþi  r f ðxk Þ ¼ Oðkdk kÞ:

ð3:17Þ

 be an upper bound of fkr2 f ðxk Þkg, i.e., kr3 f ðxk Þk 6 M.  From Lemma 3.1, we have Proof. First, we prove (3.16). Let M

Z   X  X i1  i1    2   2  2 2  r f ðxkþi Þ  r f ðxk Þ 6 r f ðxkþlþ1 Þ  r f ðxkþl Þ ¼  l¼0

l¼0

0

1

 

r3 f ðxkþl þ ckþl akþl dkþl Þakþl dkþl dckþl  

  i1 X M   6 kdkþl k ¼ Oðkdk kÞ: sup r3 f ðxkþl þ ckþl akþl dkþl Þkakþl kkdkþl k 6 m c 2ð0;1Þ l¼0 kþl l¼0 i1 X

Next, we prove (3.17). Indeed, we have from Lemma 3.1 and (3.16)

     Z 1       2   d  2 2 2 2 2 þ  r f ðx Þ r f ðx Þ  r f ðx Þ   Bkþi  r f ðxk Þ 6  Bd  r f ðxkþi þ cakþi dkþi Þ  r f ðxkþi Þdc þ Oðkdk kÞ kþi kþi kþi k  6 0  Z 1 Z 1   3  dc þ Oðkdk kÞ ¼ r f ðx þ gca d Þ ca d d g kþi kþi kþi kþi kþi   0 0 Z 1 Z 1   M   kdkþi kdc þ Oðkdk kÞ ¼ Oðkdk kÞ: sup r3 f ðxkþi þ gakþi dkþi Þkakþi kkdkþi kdc þ Oðkdk kÞ 6 6 0 g2ð0;1Þ 0 m The proof is complete.

h

Lemma 3.4. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then it holds that

   

 iþ1  i 2 dkþiþ1  dk  ¼ O kdkþi  dk k þ O kg kþiþ1  g kiþ1 k þ Oðkg kþi  g ik kÞ þ Oðkdk k Þ:

ð3:18Þ

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

11385

Proof. By the definition of dk , we have

         iþ1 k i iþ1 iþ1  iþ1 iþ1 kdkþiþ1  dk k ¼ hkþiþ1 g kþiþ1 þ bkþiþ1 dkþi þ hiþ1  ðbiþ1 k gk k Þdi  6 hkþiþ1 g kþiþ1  hk g k  þ bkþiþ1 dkþi  ðbk Þdk : First, we prove that

   i i 2 iþ1 i bkþiþ1 dkþi  ðbiþ1 k Þdk  ¼ Oðkdkþi  dk kÞ þ Oðkg kþiþ1  g k kÞ þ Oðkg kþi  g k kÞ þ Oðkdk k Þ: From Lemma 3.1 and Theorem 3.1, we have bkþ1 ¼ 1 þ

T

ðg kþ1 þg k Þ T

b

Bbk dk

ð3:19Þ

. So, we obtain

dk Bk dk   i T d  ðg  ðg kiþ1 þ g ik ÞT r2 f ðxk Þdk i  þ g Þ     i i iþ1 kþiþ1 kþi Bkþi dkþi dkþi  dk  bkþiþ1 dkþi  ðbk Þdk  6 kdkþi  dk k þ  i T i T d 2   ðdk Þ r f ðxk Þdk dkþi Bkþi dkþi   i g T Bd ðg iþ1 ÞT r2 f ðxk Þdk i  kþi dkþi   i 6 kdkþi  dk k þ  kþiþ1 dkþi  ki T 2 dk  i   dT Bd ðd Þ r f ðx Þd d k k k kþi kþi kþi   i g T Bd d ðg ik ÞT r2 f ðxk Þdk i  kþi kþi   þ  kþi d  d : kþi i i k dT Bd ðdk ÞT r2 f ðxk Þdk  kþi kþi dkþi

T i T i i 2 iþ1 2 2 iþ1 2 Letting ckþi ¼ dkþi Bd kþi dkþi ððdk Þ r f ðxk Þdk Þ, noting that 1=c kþi 6 1=ðm kdkþi k kdk k Þ; kg kþiþ1 k ¼ Oðkdkþi kÞ and kg k k ¼ Oðkdk kÞ 2 (by Lemma 3.1) and k Bd kþi  r f ðxk Þk ¼ Oðkdk kÞ (by Lemma 3.17), we get   i T 2  g T      ðg iþ1 1  kþi dkþi  kþiþ1 Bd  T i T i i T d i 2 2 iþ1 T k Þ r f ðxk Þdk i  dkþi  dk  ¼  T g kþiþ1 Bd kþi dkþi ðdk Þ r f ðxk Þdk dkþi  ðg k Þ r f ðxk Þdk dkþi Bkþi dkþi dk  i i T 2  ckþi  d Bd ðdk Þ r f ðxk Þdk kþi kþi dkþi    1    T   T i i T i i i T 2 2 d i 6 g kþiþ1 Bd kþi ðdkþi  dk Þððdk Þ r f ðxk Þdk Þdkþi  þ g kþiþ1 Bkþi dk ððdk  dkþi Þ r f ðxk Þdk Þdkþi  ckþi          iþ1 T d i T  i T i i 2 2 þðg kþiþ1  g kiþ1 ÞT Bd kþi dk ðdkþi r f ðxk Þdk Þdkþi  þ ðg k Þ Bkþi dk dkþi r f ðxk Þðdk  dkþi Þ dkþi               iþ1 T 2  i T i T T 2 2 2 d þðg kiþ1 Þ B Bd kþi  r f ðxk Þ dk ðdkþi r f ðxk Þdkþi Þdkþi  þ ðg k Þ r f ðxk Þdk dkþi r f ðxk Þ  Bkþi dkþi dkþi       i T i i  i 2 þðg kiþ1 ÞT r2 f ðxk Þdk dkþi Bd d ðdkþi  dk Þ ¼ Oðkg kþiþ1  g iþ1 ð3:20Þ k kÞ þ Oðkdkþi  dk kÞ þ Oðkdk k Þ:  kþi k  gT Bc d  i T 2 i ðg Þ r f ðx Þd  i kþi kþi In a similar way to deal with  kþi dkþi  ki T 2 k ki dk , we get ðdk Þ r f ðxk Þdk dTkþi Bc  kþi dkþi

  i g T Bd d    ðg ik ÞT r2 f ðxk Þdk i  1    kþi kþi kþi  T d i T i i T i T 2 2 dkþi  i T 2 d ¼  T g kþi Bkþi dkþi ððdk Þ r f ðxk Þdk Þdkþi  ðg ik Þ r f ðxk Þdk dkþi Bd kþi dkþi dk  i k d Bd c kþi ðd Þ r f ðx Þd d k kþi kþi k k kþi    1   T d    i i T i i i i T 2 2 6 g kþi Bkþi ðdkþi  dk Þððdk Þ r f ðxk Þdk Þdkþi  þ g Tkþi Bd kþi dk ððdk  dkþi Þ r f ðxk Þdk Þdkþi  ckþi        i Td i T  i T i i 2 2 þðg kþi  g ik ÞT Bd kþi dk ðdkþi r f ðxk Þdk Þdkþi  þ ðg k Þ Bkþi dk ðdkþi r f ðxk Þðdk  dkþi ÞÞdkþi           i T 2  i T i T 2 2 2 d þðg ik ÞT Bd kþi  r f ðxk Þ dk ðdkþi r f ðxk Þdkþi Þdkþi  þ ðg k Þ r f ðxk Þdk ðdkþi ðr f ðxk Þ  Bkþi Þdkþi Þdkþi     i T i i  i 2 i þðg ik ÞT r2 f ðxk Þdk ðdkþi Bd ð3:21Þ kþi dk Þðdkþi  dk Þ ¼ Oðkg kþi  g k kÞ þ Oðkdkþi  dk kÞ þ Oðkdk k Þ:

From (3.20) and (3.21), we can obtain the result (3.19). Next, we prove 2 khkþiþ1 g kþiþ1  hkiþ1 g kiþ1 k ¼ Oðkg kþiþ1  g iþ1 k kÞ þ Oðkdk k Þ:

ð3:22Þ

Let Bk and c Bk be defined as those in the proof of Lemma 2.1 and Lemma 3.11. Since the Assumptions (A) and (B),  ! 0, we have

j

T dkþi Bd kþi dkþi T dkþi Bkþi dkþi



1

 1j  j

fk 2

Z

1

T T dkþi Bd kþi dkþi  dkþi Bkþi dkþi

mkdkþi k2

r2 f ðxkþi þ sakþi dkþi Þds 

mkdkþi k 0 Z 1 L ð1  kþi Þsdskakþi dkþi k  m 0 L ð1  kþi Þkdkþi k ¼ Oðkdkþi kÞ;  2m

j

Z 0

1

r2 f ðxkþi þ kþi sakþi dkþi Þdskkdkþi k2 g

11386

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

where L is the Lipschitz constant for r2 f on set N. Then, since ak ¼ ck for all k sufficiently large, we get by the mean-value theorem

g Tkþiþ1 dkþi ¼ gðxkþi þ ckþi dkþi ÞT dkþi ¼ gðxkþi ÞT dkþi þ ðgðxkþi þ ckþi dkþi Þ  gðxkþi ÞT Þdkþi T  g Tkþi dkþi þ ckþi dkþi Bd kþi dkþi

¼ð

T dkþi Bd kþi dkþi T

dkþi BkþI dkþi

 1Þkg kþi k2 ¼ Oðkdkþi kÞkg kþi k2 :

Then we have iþ1 khkþiþ1 g kþi ¼ hiþ1 k gk k T

¼k

dkþi ðg kþiþ1  g kþi Þ 2

kg kþi k

 kg kþiþ1  g kiþ1 k þ k  kg kþiþ1  g kiþ1 k þ k

i

g kþiþ1 

ðdk ÞT ðg Kiþ1  g ik Þ

g Tkþiþ1 dkþi kg kþi k2

kg ik k2

g iþ1 k k

g kþiþ1 k

Oðkdkþi kÞkg kþi k2 kg kþi k2

g kþiþ1 k

2

¼ kg kþiþ1  g kiþ1 k þ Oðkdk Þ: The proof is complete. Lemma 3.5. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then we have i

2 i i kg kþiþ1  g iþ1 k k 6 kg kþi  g k k þ Oðkdk k Þ þ Mkakþi dkþi  ak dk k:

ð3:23Þ

Proof. By the definition of g k , we have

   i 2 i i d kg kþiþ1  g iþ1 k k ¼ g kþi þ akþi Bkþi dkþi  g k  ak r f ðxk Þdk      d  i i i i  6 kg kþi  g ik k þ ðr2 f ðxk Þ  Bd kþi Þak dk  þ  Bkþi ðakþi dkþi  ak dk Þ    1   i  i i 6 kg kþi  g ik k þ r2 f ðxk Þ  Bd kþi kdk k þ M akþi dkþi  ak dk  m i

6 kg kþi  g ik k þ Oðkdk k2 Þ þ Mkakþi dkþi  aik dk k:



Lemma 3.6. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then we have

   i i 2 akþi dkþi  aik dk  6 Oðkg kþi  g ik kÞ þ Oðkdkþi  dk kÞ þ Oðkdk k Þ:

ð3:24Þ

Proof. By the definition of ak in the RMFR method, we have

  i   kg k2 d    kg i k2 d 1    i i T i T i kþi 2 2 2 kakþi dkþi  aik dk k ¼  T kþi  i T k2 k i  ¼ kg kþi k ððdk Þ r f ðxk Þdk Þdkþi  kg ik k dkþi Bd kþi dkþi dk   d Bd c kþi ðd Þ r f ðx Þd d k k k kþi kþi kþi 1  T i T i i i 2 i jg kþi ðg kþi  g k Þððdk Þ r f ðxk Þdk Þdkþi j þ jg Tkþi g ik ððdk  dkþi Þr2 f ðxk Þdk Þdkþi j 6 ckþi i

i

T

i

þjðg kþi  g ik ÞT g ik ððdk ÞT r2 f ðxk Þdk Þdkþi j þ kg ik k2 ðdkþi r2 f ðxk Þðdk  dkþi ÞÞdkþi j        T i T i þjkg ik k2 ðdkþi r2 f ðxk Þdkþi Þðdk  dkþi Þj þ kg ik k2 dkþi r2 f ðxk Þ  Bd kþi dkþi dk  ; where ckþi is given in Lemma 3.4. It is not difficult to get (3.24) in a way similar to derive (3.20). The next theorem plays an important role in the proof of the n-step quadratic convergence of the RMFR method. Theorem 3.2. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then we have i

kakrþi dkrþi  aikr dkr k ¼ Oðkdkr k2 Þ;

i ¼ 0; 1; . . . ; jðkÞ  1:

ð3:25Þ

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390 i

kakrþi dkrþi  aikr dkr k ¼ Oðkxkr  x k2 Þ;

i ¼ 0; 1; . . . ; jðkÞ  1:

11387

ð3:26Þ

Proof. We first prove the following relations by induction in i.

kg kþi  g ik k ¼ Oðkdk k2 Þ;

ð3:27Þ

i

kdkþi  dk k ¼ Oðkdk k2 Þ;

ð3:28Þ

i

kakþi dkþi  aik dk k ¼ Oðkdk k2 Þ:

ð3:29Þ g 0k ; dk

0 dk .

For i ¼ 0, the relations (3.27)–(3.29) follow from (3.24) and the fact g k ¼ ¼ Suppose that (3.27)–(3.29) hold for some i P 0. We are going to show they hold for i þ 1. From (3.23) and the induction assumption, we have i

2 2 i i kg kþiþ1  g iþ1 k k 6 kg kþi  g k k þ Oðkdk k Þ þ Mðkakþi dkþi  ak dk kÞ ¼ Oðkdk k Þ:

ð3:30Þ

We also have from (3.18) iþ1

kdkþiþ1  dk k ¼ Oðkdk k2 Þ:

ð3:31Þ

The Eqs. (3.24) and (3.30) imply iþ1

kakþiþ1 dkþiþ1  aiþ1 k d

iþ1

k ¼ Oðkg kþiþ1  g kiþ1 kÞ þ Oðkdkþiþ1  dk kÞ þ Oðkdk k2 Þ ¼ Oðkdk k2 Þ:

ð3:32Þ

The above process has shown (3.27)–(3.29) which implies (3.25). The relation (3.26) then follows from (3.25) and the fact kdkr k ¼ kg kr k 6 Mkxkr  x k. Now we establish the n-step quadratic convergence of the RMFR method. h Theorem 3.3. Under the conditions of Assumptions A and B, the sequence fxk g generated by Algorithm 2 is n-step quadratically convergent. That is, there exists a constant c > 0, such that

lim sup

k!1

kxkrþn  x k kxkr  x k2

6 c < 1:

ð3:33Þ

Proof. Since the sequence ff ðxk Þg is decreasing and jðkrÞ 6 n, we have f ðxkrþn Þ  f ðx Þ 6 f ðxkrþjðkrÞ Þ  f ðx Þ. This together with (2.7) implies

kxkrþn  x k 6 ðM=mÞðnjðkÞÞ=2 kxkrþjðkÞ  x k:

ð3:34Þ

On the other hand, we have

     jðkÞ1  jðkÞ1 jðkÞ1 X  X X i  i jðkÞ i i ¼ kxkrþjðkÞ  xkr k ¼  ½ðxkrþiþ1  xkrþi Þ  ðxiþ1  x Þ ½ a d  a d  kakrþi dkrþi  aikr dkr k   krþi krþi kr kr kr  6 kr   i¼0   i¼0 i¼0 ¼ Oðkxkr  x k2 Þ;

ð3:35Þ

where the last equality follows from (3.26). Therefore, we obtain

       jðkrÞ kxkrþjðkÞ  x k 6 xkrþjðkÞ  x c fkr  þ kx c fkr  x k ¼ Oðkxkr  x k2 Þ þ kxkr  x k:

ð3:36Þ

jðkrÞ Since xkr is the exact minimizer of the function c fkr , it can be regarded as the iterate generated by a Newton step starting jðkrÞ from xkr . By the quadratic convergence of Newton’s method, it is not difficult to get kxkr  x k ¼ Oðkxkr  x k2 Þ. Consequently, the n-step quadratic convergence of fxk g follows from (3.36). h

Remark. The proof of Theorem (3.3) is similar to the proofs in [6,14]. The difference is that here the line search is inexact (see Appendix of [6,14]).

4. Numerical experiments In this section, we report some numerical experiments. We compare the performance of RMFR with that of the MFR method [2], the MPRP method [15] and the CG-DESCENT method [16]. The test problems are the unconstrained problems

11388

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

from Andrei [17]. All codes were written in Fortran and ran on IBM T60 PC with two 1.83 GHz CPU processors and 2.5 GB RAM memory. The CG-DESCENT codes can be obtained from Hager’s web page at http://www.math.ufl.edu/hager/papers/CG. As pointed out by Powell [10], for large scale problems, a restart strategy with r P n will become meaningless. It is suggested by Powell that using a relative small value of r in a restart conjugate gradient is preferable when applied to large scale problems. Following Powell’s suggestion, we try to find a better value of r for large scale problems through numerical experiments. To this end, we select 83 problems from [17] where 36 problems were from CUTE library. We set n ¼ 1000 for all problems to test the RMFR method with different values of r. Some r values are chosen as follow: r ¼ 10; 50; 100; 400; 800. The parameters in the RMFR method are as follows: We set q ¼ 0:5; d1 ¼ 104 ; d2 ¼ 0 and e ¼ 108 , we stop the iteration if the inequality kg k Þk 6 106 k or iteration number exceeds 5  104 . Fig. 1 shows the performance of the RMFR method to different r, which is evaluated using the profile of Dolan and Moré [18]. We can see that the rule r ¼ 50 seems to be the best when n is large. Keeping this in mind, we then compare the performance of RMFR method (r ¼ 50) with that of the MFR method [2], the MPRP method [15] and the CG-DESCENT method [16] according to the CPU time, the number of function evaluations and the number of gradient evaluations for all 83 test problems, respectively.

Fig: Performance profiles based on CPU times of diffirent restart number 1 r=10 r=50 r=100 r=400 r=800

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 5

10

15

20

25

30

35

40

Fig. 1. Performance profiles of RMFR with different values of r based on CPU time.

Fig: Performance profiles based on CPU times (N=1000) 1

cg descent mprp mfr rmfr

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2

4

6

8

10

12

Fig. 2. Performance profiles based on CPU time.

14

16

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

11389

Performance profiles based on the number of function evaluations (N=1000) 1

cg descent mprp mfr rmfr

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2

4

6

8

10

12

14

Fig. 3. Performance profiles for the number of function evaluations.

Performance profiles based on the number of gradient evaluations 1

cg descent mprp mfr rmfr

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2

4

6

8

10

12

14

Fig. 4. Performance profiles for the number of gradient evaluations.

Figs. 2–4 shows the performance of these methods in CPU time, the number of function evaluations and the number of gradient evaluations, respectively. Fig. 2 shows the CG-DESCENT method solves about 52% of the test problems with the least CPU time, but the RMFR method outperforms the others in probability of 2–8 times best time; Fig. 4 shows the MPRP method solves about 40% of the test problems with the least number of gradient evaluations, but the RMFR method outperforms the others in probability of 2–12 times least number; Fig. 3 shows the RMFR method is competitive to others in the number of function evaluations. References [1] R. Fletcher, C. Reeves, Function minimization by conjugate gradients, J. Comput. 7 (1964) 149–154. [2] L. Zhang, W.J. Zhou, D.H. Li, Global convergence of a modified Fletcher–Revves conjugate gradient method with Armijo-type line search, Numer. Math. 104 (2006) 561–572. [3] H.P. Crowder, P. Wolfe, Linear convergence of the conjugate gradient method, IBM J. Res. Develop. 16 (1972) 431–433. [4] M.J. Powell, Some convergence properties of the conjugate gradient method, Math. Program. 11 (1976) 42–49. [5] K. Ritter, On the rate of superlinear convergence of a class of variable metric methods, Numer. Math. 35 (1980) 293–313. [6] A. Cohen, Rate of convergence of several conjugate gradient algorithms, SIAM J. Numer. Anal. 9 (1972) 248–259.

11390 [7] [8] [9] [10] [11] [12] [13] [14] [15]

A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390

W. Burmeister, Die Konvergenzordnung des Fletcher–Powell algorithmus, Z. Angew. Math. Mech. 53 (1973) 693–699. E.M.L. Beale, A Derivative of Conjugate Gradients, in: Numerical Methods for Nonlinear Optimization, Academic Press, London, 1972, pp. 39–43. M.F. McGuire, P. Wolfe, Evaluating a restart procedure for conjugate gradients, Report RC-4382, IBM Research Center, Yorktown Heights, 1973. M.J.D. Powell, Powell, Restart procedure of the conjugate gradient method, Math. Program. 2 (1977) 241–254. Y.H. Dai, Y.X. Yuan, Convergence properties of Beale–Powell restart algorithm, Sci. Chin. 41 (1998) 1142–1150. Y.H. Dai, L.Z. Liao, Duan Li, On restart procedure for the conjugate gradient method, Numer. Alg. 35 (2004) 249–260. D.H. Li, B.S. Tian, n-Step quadratic convergence of the MPRP method with a restart strategy, J. Comput. Appl. Math. 17 (2011) 4978–4990. Y.X. Yuan, W.Y. Sun, Optical Theory and Method, Science Press, 1997 (in Chinese). L. Zhang, W.J. Zhou, D.H. Li, A descent modified Polak–Ribiere–Polyak conjugate gradient method and its global convergence, IMA J. Numer. Anal. 26 (2006) 629–640. [16] W.W. Hager, H. Zhang, A new conjugate gradient method with guaranteed descent and an effcient line search, SIAM J. Optim. 16 (2005) 170–192. [17] N. Andrei, An uncontrained optimization test functions collection, AMO 10 (2008) 147–161. [18] E.D. Dolan, J. Moré, Benchmarking optimization software with performance profiles, Math. Program. 91 (2002) 201–213.