Applied Mathematics and Computation 218 (2012) 11380–11390
Contents lists available at SciVerse ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
The convergence rate of a restart MFR conjugate gradient method with inexact line search q Aiping Qu a,⇑, Donghui Li b, Min Li a a b
Department of Mathematics, Huaihua University, Hunan 418000, China School of Mathematical Sciences, South China Normal University, Guangzhou 510631, China
a r t i c l e
i n f o
a b s t r a c t
Keywords: Unconstrained optimization Restart Fletcher–Reeves conjugate gradient method n-Step quadratic convergence
In this paper, we investigate the convergence rate of the modified Fletcher–Reeves (MFR) method proposed by Zhang et al. [L. Zhang, W.J. Zhou, D.H. Li, Global convergence of a modified Fletcher–Revves conjugate gradient method with Armijo-type line search, Numer. Math. 104 (2006) 561–572.]. Under reasonable conditions, we show that the MFR method with some inexact line search will be n-step superlinearly and even quadratically convergent if some restart technique is used. Some numerical results are also reported to verify the theoretical results. Ó 2012 Elsevier Inc. All rights reserved.
1. Introduction Conjugate gradient methods are quite useful in large scale unconstrained optimization. For a general unconstrained problem
min f ðxÞ;
x 2 Rn ;
ð1:1Þ
where f : Rn ! R is continuously differentiable. The iterates of conjugate gradient methods are obtained by
xkþ1 ¼ xk þ ak dk ;
k ¼ 0; 1; . . . ;
ð1:2Þ
with
dk ¼
if k ¼ 0;
g 0 ;
g k þ bk dk1 ; if k > 0;
ð1:3Þ
where ak is a stepsize and bk is a parameter. Throughout, we use gðxÞ to denote the gradient of f at x and abbreviate gðxk Þ as g k . The Fletcher–Reeves (FR) method proposed by Fletcher and Reeves [1] is a well-known nonlinear conjugate gradient method. In FR method, the parameter bk is specified by
bFR k ¼
kg k k2 kg k1 k2
;
ð1:4Þ
where k k stands for the Euclidean norm of vectors. Recently, Zhang et al. [2] proposed a modified FR method (MFR). An advantage of the MFR method is that the direction generated by the method is always a descent direction. Under appropriate conditions, the MFR method with inexact line search is globally convergent [2]. q
The work was supported by the NSF of China granted 11071087 and SRF of Huaihua University HHUY2010-04.
⇑ Corresponding author.
E-mail addresses:
[email protected] (A. Qu),
[email protected] (D. Li),
[email protected] (M. Li). 0096-3003/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2012.05.023
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
11381
In this paper, we further study the MFR method. We focused on the convergence rate of the MFR method. The convergence rate of the standard conjugate gradient methods has been well studied. The linear convergence of the standard conjugate gradient method is well-known. Indeed, Crowder and Wolfe [3] gave an example to show that the rate of convergence is exactly linear. Powell [4] showed that if the initial search direction is an arbitrary downhill direction for a convex quadratic function, then either termination occurs or the rate of convergence is only linear. However, if some restart strategy is used, the convergence rate of the FR method can be superlinear/quadratic [5,6]. The simplest restart procedure used in the nonlinear conjugate gradient procedure is to restart the iteration at every r steps by setting br ¼ 0, that is, by taking a steepest descent step at every r iterations. Choen [6] and Burmerster [7] proved the n-step quadratic convergence of the conjugate gradient method with exact line search for general objective function, that is
kxkrþn x k ¼ O kxkr x k2 :
ð1:5Þ
Other restart strategies and related work can be found in [8–12] etc. Quite recently, Li and Tian [13] studied the restart modified Polak–Ribiere–Polyak (MPRP) conjugate gradient method and proved its n-step quadratic convergence if some additional assumptions hold. It should be pointed out that the n-step quadratic convergence of the MPRP method is retained if some inexact line search is used. The purpose of this paper is to investigate the n-step quadratic convergence of the restart MFR method with inexact line search. We will show that under some reasonable conditions, the restarted MFR method with an Armijo type line search is also quadratically convergent. The paper is organized as follows. In Section 2, we introduce a restart strategy in the MFR method and propose a restart MFR method (called RMFR method). In Section 3, we establish the n-step superlinearly/quadratically convergence rate of the RMFR method. In Section 4, we report some numerical results to test the RMFR method. 2. The restart MFR method and its global convergence In this section, we introduce a restart strategy in the MFR method and propose a restart MFR method (called RMFR method). The steps of the method are stated as follows. Algorithm 3.1. (RMFR method) Step 0: Given constants d1 2 ð0; 1=2Þ; 0 6 d2 < mð1=2 d1 Þ; 2 ð0; 1Þ. Given positive sequence fk g converging to zero. Choose an initial point x0 2 Rn , Let k ¼ 0. Step 1: If kg k k 6 , stop. Step 2: Compute
ck ¼
k kg k k2 : þ k dk Þ g k Þ
ð2:1Þ
T dk ðgðxk
Let ak ¼ maxfjck jqj jj ¼ 0; 1; 2; . . .g satisfying
f ðxk þ ak dk Þ 6 f ðxk Þ þ d1 ak g Tk dk d2 a2k kdk k2 : Step Step Step Step
3: 4: 5: 6:
ð2:2Þ
Let xkþ1 ¼ xk þ ak dk and k :¼ k þ 1. If kg k k 6 , stop. If k ¼ r, we let x0 :¼ xk . Go to Step 1. Compute dk by
dk ¼ hk g k þ bFR k dk1 ;
ð2:3Þ
where
bFR k ¼
T
kg k k2 2
kg k1 k
;
hk ¼
dk1 yk1 kg k1 k2
;
yk1 ¼ g k g k1 :
Go to Step 2.
Remark. 1. If we remove Step 5 in the algorithm, then the method reduces to the MFR method in [2]. 2. The scalar ck is an approximation to the exact steplength. It was used in [15,13] as initial steplength. It is easy to see from (2.3) and (2.4) that dk is a descent direction of f at xk . In fact, it satisfies
ð2:4Þ
11382
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390 T
dk g k ¼ kg k k2 :
ð2:5Þ
This implies
kg k k 6 kdk k:
ð2:6Þ
The remainder of this section is devoted to the global convergence of the RMFR method. To this end, we make the the following assumptions. Assumption (A) (H1) The level set X ¼ fx 2 Rn jf ðxÞ 6 f ðx0 Þg is bounded. (H2) f is twice continuously differentiable and uniformly convex, i.e., there are positive constants M P m such that T
mkdk2 6 d r2 f ðxÞd 6 Mkdk2 ;
8x; d 2 Rn ;
where r2 f ðxÞ denotes the Hessian of f at x. It is obvious that under the Assumption (A), the problem (1.1) has a unique solution x which satisfies
1 1 mkx x k2 6 f ðxÞ f ðx Þ 6 Mkx x k2 ; 2 2
8x 2 R n
ð2:7Þ
and
mkxk x k 6 kg k k 6 Mkxk x k:
ð2:8Þ
Lemma 2.1. Let Assumption (A) hold and the sequence fxk g be generated by the MFR method, then when k is sufficiently large
ck 6
1 : m
ð2:9Þ
Proof. Denote Bk ¼
ck ¼
R1 0
r2 f ðxk þ ck dk Þdc. By the mean value theorem, we have
k kg k k2 kg k2 kg k k2 1 6 6 : ¼ T k 2 m þ k dk Þ g k Þ dk Bk dk mkdk k
T dk ðgðxk
ð2:10Þ
The proof is complete. The following theorem shows the global convergence for RMFR method. The proof is similar to Theorem 3.3 in [2]. We omit it. h Theorem 2.1. Let the conditions in Assumptions (A) hold. Then the sequence fxk g generated by the RMFR method converges to the unique solution of problem (1.1).
3. N-step quadratic convergence of the restart MFR method In this section, we prove the n-step quadratic convergence of the RMFR method proposed in the last section. We first prove the following lemmas. Lemma 3.1. Let Assumption (A) hold and the sequence fxk g be generated by the RMFR method. Then there are positive constants C i ; i ¼ 1; 2; 3; 4 such that the following inequalities hold for all k
kg kþ1 k 6 C 1 kdk k; jbFR kþ1 j 6 C 2 ; Proof. Denote c Bk ¼ orem that
ð3:11Þ
jhkþ1 j 6 C 3 ; R1 0
kdkþ1 k 6 C 4 kdk k:
r2 f ðxk þ sak dk Þds and Bk ¼
R1 0
r2 f ðxk þ sk dk Þds. It follows from (2.6), (2.9) and the mean-value the-
M M g kþ1 ¼ kg k þ ðg kþ1 g k Þk 6 kg k k þ jak j Bk dk 6 kdk k þ kdk k ¼ 1 þ kdk k , C 1 kdk k: c m m By the definition of bFR k and ck , we have
11383
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
kg kþ1 k2
bFR kþ1 ¼
kg k k2
¼
g Tkþ1 ðg kþ1 g k Þ þ g Tkþ1 g k kg k k2
T
Bk dk g k g k þ ak c Bk dk g Tkþ1 g k ak g Tkþ1 c Bk dk ak g Tkþ1 c Bk dk ak ðg kþ1 þ g k ÞT c ¼ þ ¼ þ ¼ 1 þ : T T T 2 2 kg k k ck dk Bk dk kg k k ck dk Bk dk ck dk Bk dk
Since ak 6 jck j, we derive from (2.6) and (3.11)
Tc FR b 6 1 þ ak ðg kþ1 þ g k Þ Bk dk 6 1 þ jak jðkg kþ1 k þ kg k kÞMkdk k 6 1 þ M kg kþ1 k þ kg k k 6 1 þ M ðC 1 þ 1Þ , C 2 : kþ1 T 2 m kdk k kdk k m jc jmkdk k c d Bk dk k
k k
By the definition of hk , we have
mak kdk k2 2
kg k k
T
6 hkþ1 ¼
dk ðg kþ1 g k Þ 2
kg k k
¼
Bk dk ak dTk c 2
kg k k
6
M ak kdk k2 kg k k2
:
By the line search rule, if ak – ck , then q1 ak does not satisfy inequality (2.2). This implies
f ðxk þ q1 ak dk Þ f ðxk Þ > d1 ak q1 g Tk dk d2 q2 a2k kdk k2 :
ð3:12Þ
By the mean-value theorem again, there exist tk 2 ð0; 1Þ such that
f ðxk þ q1 ak dk Þ f ðxk Þ ¼ q1 ak gðxk þ t k q1 ak dk ÞT dk ¼ q1 ak g Tk dk þ q1 ak ðgðxk þ t k q1 ak dk g k ÞÞT dk 6 q1 ak g Tkdk þ M q2 a2k kdk k2 : Substituting the last inequality into (3.12), we get
ak kdk k2 P
ðd1 1Þq T ð1 d1 Þq g dk ¼ kg k k2 : M þ d2 k M þ d2
If ak ¼ ck , we obviously have
c1 m 6
mak kdk k2 2
kg k k
ak kdk k2 kg k k2
6 hkþ1 6
ð3:13Þ
1 Þq P M1 . Letting c1 ¼ minf1=M; ð1d g, we get from (3.11) Mþd2
M ak kdk k2 kg k k2
6
M : m
Therefore, we obtain
M jhk j 6 max jc1 mj; j j , C 3 : m At last, we get by the definition of dk
jdkþ1 j 6 jhkþ1 jkg kþ1 k þ bFR kþ1 kdk k 6 ðC 3 C 1 þ C 2 Þkdk k , C 4 kdk k: The proof is complete. h Lemma 3.2. Let Assumption (A) hold and the sequence fxk g be generated by the RMFR method. Then it holds that
dk ! 0:
ð3:14Þ
Proof. It follows from Lemma 3.1 that kdkrþi k 6 C i4 kdkr k; i ¼ 0; 1; . . . ; n 1. Since the sequence fxk g generated by Algorithm 2 converges to the unique solution x of (1.1) which satisfies gðx Þ ¼ 0, we obtain fdkr g ! gðx Þ ¼ 0, as k ! 1, This yields the desired conclusion. The following theorem shows the acceptance of ck as a steplength when k is sufficiently large. h Theorem 3.1. Let Assumption (A) hold and the sequence fxk g be generated by the RMFR method. Then when k is sufficiently large, the initial step length ck satisfies the Armijo line search condition. Proof. By the definition of ck , Lemma 3.2 and the fact dk ! 0, we obtain
1 1 T f ðxk þ ck dk Þ ¼ f ðxk Þ þ ck g Tk dk þ c2k dk Bk dk þ c2k oðkdk k2 Þ ¼ f ðxk Þ þ ck g Tk dk þ c2k oðkdk k2 Þ 2 2 1 1 2 2 T 2 d1 ck kg k k þ ck oðkdk k Þ 6 f ðxk Þ þ d1 ck g Tk dk d1 c2k mkdk k2 þ c2k oðkdk k2 Þ ¼ f ðxk Þ þ d1 ck g k dk 2 2 ¼ f ðxk Þ þ d1 ck g Tk dk d2 c2k kdk k2 cc2k kdk k2 þ c2k oðkdk k2 Þ;
11384
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
where c ¼ m2 md1 d2 . This implies that the following inequality holds for all k sufficiently large,
f ðxk þ ck dk Þ 6 f ðxk Þ þ d1 ck g Tk dk d2 c2k kdk k2 : In other words, when k is sufficiently large, the initial steplength ak ¼ ck satisfies the Armijo-type line search condition (2.2). We prepare to show the n-step quadratic convergent of the RMFR method, which requires the following assumptions. Assumption (B) (H3) In some neighborhood N of X; f is three times continuously differentiable. Define a quadratic function
1 fkrd ðxÞ ¼ f ðxkr Þ þ rf ðxkr ÞT ðx xkr Þ þ ðx xkr ÞT r2 f ðxkr Þðx xkr Þ: 2
ð3:15Þ
i Let fxikr g and fdkr g be the iterates and directions generated by the RMFR method to minimize the quadratic function c fkr with initial point x0kr ¼ xkr . Specifically, the sequence fxikr g is generated by the following process:
x0kr ¼ xkr ; ( i dkr
¼
i
iþ1 xkr ¼ xikr þ aikr dkr ;
g 0kr ;
i ¼ 0; 1; . . . ;
if i ¼ 0; i1
hikr g ikr þ bikr dkr ; if i P 1;
where
bikr ¼
kg ikr k2
; i1 2 kg kr k
i1
hikr ¼
ðdkr ÞT yi1 kr 2 kg i1 kr k
;
i1 i1 ykr ¼ g ikr g kr :
As we have shown in Theorem 3.1, when k is sufficiently large, the steplength ck is always accepted. Since c fkr is a quadratic i1 function, ck is the same as the steplength obtained by the exact line search. Consequently, we have ðg ikr ÞT dkr ¼ 0. Moreover, jðkÞ there is an index jðkÞ 6 n such that xkr is the minimizer of c fkr . In order to prove the n-step quadratic convergence of the RMFR method, it is necessary to first prove several lemmas which are similar to Lemmas (A.1)–(A.10) in [6]. Lemma 3.3. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then
kr2 f ðxkþi Þ r2 f ðxk Þk ¼ Oðkdk kÞ;
ð3:16Þ
d 2 Bkþi r f ðxk Þ ¼ Oðkdk kÞ:
ð3:17Þ
be an upper bound of fkr2 f ðxk Þkg, i.e., kr3 f ðxk Þk 6 M. From Lemma 3.1, we have Proof. First, we prove (3.16). Let M
Z X X i1 i1 2 2 2 2 r f ðxkþi Þ r f ðxk Þ 6 r f ðxkþlþ1 Þ r f ðxkþl Þ ¼ l¼0
l¼0
0
1
r3 f ðxkþl þ ckþl akþl dkþl Þakþl dkþl dckþl
i1 X M 6 kdkþl k ¼ Oðkdk kÞ: sup r3 f ðxkþl þ ckþl akþl dkþl Þkakþl kkdkþl k 6 m c 2ð0;1Þ l¼0 kþl l¼0 i1 X
Next, we prove (3.17). Indeed, we have from Lemma 3.1 and (3.16)
Z 1 2 d 2 2 2 2 2 þ r f ðx Þ r f ðx Þ r f ðx Þ Bkþi r f ðxk Þ 6 Bd r f ðxkþi þ cakþi dkþi Þ r f ðxkþi Þdc þ Oðkdk kÞ kþi kþi kþi k 6 0 Z 1 Z 1 3 dc þ Oðkdk kÞ ¼ r f ðx þ gca d Þ ca d d g kþi kþi kþi kþi kþi 0 0 Z 1 Z 1 M kdkþi kdc þ Oðkdk kÞ ¼ Oðkdk kÞ: sup r3 f ðxkþi þ gakþi dkþi Þkakþi kkdkþi kdc þ Oðkdk kÞ 6 6 0 g2ð0;1Þ 0 m The proof is complete.
h
Lemma 3.4. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then it holds that
iþ1 i 2 dkþiþ1 dk ¼ O kdkþi dk k þ O kg kþiþ1 g kiþ1 k þ Oðkg kþi g ik kÞ þ Oðkdk k Þ:
ð3:18Þ
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
11385
Proof. By the definition of dk , we have
iþ1 k i iþ1 iþ1 iþ1 iþ1 kdkþiþ1 dk k ¼ hkþiþ1 g kþiþ1 þ bkþiþ1 dkþi þ hiþ1 ðbiþ1 k gk k Þdi 6 hkþiþ1 g kþiþ1 hk g k þ bkþiþ1 dkþi ðbk Þdk : First, we prove that
i i 2 iþ1 i bkþiþ1 dkþi ðbiþ1 k Þdk ¼ Oðkdkþi dk kÞ þ Oðkg kþiþ1 g k kÞ þ Oðkg kþi g k kÞ þ Oðkdk k Þ: From Lemma 3.1 and Theorem 3.1, we have bkþ1 ¼ 1 þ
T
ðg kþ1 þg k Þ T
b
Bbk dk
ð3:19Þ
. So, we obtain
dk Bk dk i T d ðg ðg kiþ1 þ g ik ÞT r2 f ðxk Þdk i þ g Þ i i iþ1 kþiþ1 kþi Bkþi dkþi dkþi dk bkþiþ1 dkþi ðbk Þdk 6 kdkþi dk k þ i T i T d 2 ðdk Þ r f ðxk Þdk dkþi Bkþi dkþi i g T Bd ðg iþ1 ÞT r2 f ðxk Þdk i kþi dkþi i 6 kdkþi dk k þ kþiþ1 dkþi ki T 2 dk i dT Bd ðd Þ r f ðx Þd d k k k kþi kþi kþi i g T Bd d ðg ik ÞT r2 f ðxk Þdk i kþi kþi þ kþi d d : kþi i i k dT Bd ðdk ÞT r2 f ðxk Þdk kþi kþi dkþi
T i T i i 2 iþ1 2 2 iþ1 2 Letting ckþi ¼ dkþi Bd kþi dkþi ððdk Þ r f ðxk Þdk Þ, noting that 1=c kþi 6 1=ðm kdkþi k kdk k Þ; kg kþiþ1 k ¼ Oðkdkþi kÞ and kg k k ¼ Oðkdk kÞ 2 (by Lemma 3.1) and k Bd kþi r f ðxk Þk ¼ Oðkdk kÞ (by Lemma 3.17), we get i T 2 g T ðg iþ1 1 kþi dkþi kþiþ1 Bd T i T i i T d i 2 2 iþ1 T k Þ r f ðxk Þdk i dkþi dk ¼ T g kþiþ1 Bd kþi dkþi ðdk Þ r f ðxk Þdk dkþi ðg k Þ r f ðxk Þdk dkþi Bkþi dkþi dk i i T 2 ckþi d Bd ðdk Þ r f ðxk Þdk kþi kþi dkþi 1 T T i i T i i i T 2 2 d i 6 g kþiþ1 Bd kþi ðdkþi dk Þððdk Þ r f ðxk Þdk Þdkþi þ g kþiþ1 Bkþi dk ððdk dkþi Þ r f ðxk Þdk Þdkþi ckþi iþ1 T d i T i T i i 2 2 þðg kþiþ1 g kiþ1 ÞT Bd kþi dk ðdkþi r f ðxk Þdk Þdkþi þ ðg k Þ Bkþi dk dkþi r f ðxk Þðdk dkþi Þ dkþi iþ1 T 2 i T i T T 2 2 2 d þðg kiþ1 Þ B Bd kþi r f ðxk Þ dk ðdkþi r f ðxk Þdkþi Þdkþi þ ðg k Þ r f ðxk Þdk dkþi r f ðxk Þ Bkþi dkþi dkþi i T i i i 2 þðg kiþ1 ÞT r2 f ðxk Þdk dkþi Bd d ðdkþi dk Þ ¼ Oðkg kþiþ1 g iþ1 ð3:20Þ k kÞ þ Oðkdkþi dk kÞ þ Oðkdk k Þ: kþi k gT Bc d i T 2 i ðg Þ r f ðx Þd i kþi kþi In a similar way to deal with kþi dkþi ki T 2 k ki dk , we get ðdk Þ r f ðxk Þdk dTkþi Bc kþi dkþi
i g T Bd d ðg ik ÞT r2 f ðxk Þdk i 1 kþi kþi kþi T d i T i i T i T 2 2 dkþi i T 2 d ¼ T g kþi Bkþi dkþi ððdk Þ r f ðxk Þdk Þdkþi ðg ik Þ r f ðxk Þdk dkþi Bd kþi dkþi dk i k d Bd c kþi ðd Þ r f ðx Þd d k kþi kþi k k kþi 1 T d i i T i i i i T 2 2 6 g kþi Bkþi ðdkþi dk Þððdk Þ r f ðxk Þdk Þdkþi þ g Tkþi Bd kþi dk ððdk dkþi Þ r f ðxk Þdk Þdkþi ckþi i Td i T i T i i 2 2 þðg kþi g ik ÞT Bd kþi dk ðdkþi r f ðxk Þdk Þdkþi þ ðg k Þ Bkþi dk ðdkþi r f ðxk Þðdk dkþi ÞÞdkþi i T 2 i T i T 2 2 2 d þðg ik ÞT Bd kþi r f ðxk Þ dk ðdkþi r f ðxk Þdkþi Þdkþi þ ðg k Þ r f ðxk Þdk ðdkþi ðr f ðxk Þ Bkþi Þdkþi Þdkþi i T i i i 2 i þðg ik ÞT r2 f ðxk Þdk ðdkþi Bd ð3:21Þ kþi dk Þðdkþi dk Þ ¼ Oðkg kþi g k kÞ þ Oðkdkþi dk kÞ þ Oðkdk k Þ:
From (3.20) and (3.21), we can obtain the result (3.19). Next, we prove 2 khkþiþ1 g kþiþ1 hkiþ1 g kiþ1 k ¼ Oðkg kþiþ1 g iþ1 k kÞ þ Oðkdk k Þ:
ð3:22Þ
Let Bk and c Bk be defined as those in the proof of Lemma 2.1 and Lemma 3.11. Since the Assumptions (A) and (B), ! 0, we have
j
T dkþi Bd kþi dkþi T dkþi Bkþi dkþi
1
1j j
fk 2
Z
1
T T dkþi Bd kþi dkþi dkþi Bkþi dkþi
mkdkþi k2
r2 f ðxkþi þ sakþi dkþi Þds
mkdkþi k 0 Z 1 L ð1 kþi Þsdskakþi dkþi k m 0 L ð1 kþi Þkdkþi k ¼ Oðkdkþi kÞ; 2m
j
Z 0
1
r2 f ðxkþi þ kþi sakþi dkþi Þdskkdkþi k2 g
11386
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
where L is the Lipschitz constant for r2 f on set N. Then, since ak ¼ ck for all k sufficiently large, we get by the mean-value theorem
g Tkþiþ1 dkþi ¼ gðxkþi þ ckþi dkþi ÞT dkþi ¼ gðxkþi ÞT dkþi þ ðgðxkþi þ ckþi dkþi Þ gðxkþi ÞT Þdkþi T g Tkþi dkþi þ ckþi dkþi Bd kþi dkþi
¼ð
T dkþi Bd kþi dkþi T
dkþi BkþI dkþi
1Þkg kþi k2 ¼ Oðkdkþi kÞkg kþi k2 :
Then we have iþ1 khkþiþ1 g kþi ¼ hiþ1 k gk k T
¼k
dkþi ðg kþiþ1 g kþi Þ 2
kg kþi k
kg kþiþ1 g kiþ1 k þ k kg kþiþ1 g kiþ1 k þ k
i
g kþiþ1
ðdk ÞT ðg Kiþ1 g ik Þ
g Tkþiþ1 dkþi kg kþi k2
kg ik k2
g iþ1 k k
g kþiþ1 k
Oðkdkþi kÞkg kþi k2 kg kþi k2
g kþiþ1 k
2
¼ kg kþiþ1 g kiþ1 k þ Oðkdk Þ: The proof is complete. Lemma 3.5. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then we have i
2 i i kg kþiþ1 g iþ1 k k 6 kg kþi g k k þ Oðkdk k Þ þ Mkakþi dkþi ak dk k:
ð3:23Þ
Proof. By the definition of g k , we have
i 2 i i d kg kþiþ1 g iþ1 k k ¼ g kþi þ akþi Bkþi dkþi g k ak r f ðxk Þdk d i i i i 6 kg kþi g ik k þ ðr2 f ðxk Þ Bd kþi Þak dk þ Bkþi ðakþi dkþi ak dk Þ 1 i i i 6 kg kþi g ik k þ r2 f ðxk Þ Bd kþi kdk k þ M akþi dkþi ak dk m i
6 kg kþi g ik k þ Oðkdk k2 Þ þ Mkakþi dkþi aik dk k:
Lemma 3.6. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then we have
i i 2 akþi dkþi aik dk 6 Oðkg kþi g ik kÞ þ Oðkdkþi dk kÞ þ Oðkdk k Þ:
ð3:24Þ
Proof. By the definition of ak in the RMFR method, we have
i kg k2 d kg i k2 d 1 i i T i T i kþi 2 2 2 kakþi dkþi aik dk k ¼ T kþi i T k2 k i ¼ kg kþi k ððdk Þ r f ðxk Þdk Þdkþi kg ik k dkþi Bd kþi dkþi dk d Bd c kþi ðd Þ r f ðx Þd d k k k kþi kþi kþi 1 T i T i i i 2 i jg kþi ðg kþi g k Þððdk Þ r f ðxk Þdk Þdkþi j þ jg Tkþi g ik ððdk dkþi Þr2 f ðxk Þdk Þdkþi j 6 ckþi i
i
T
i
þjðg kþi g ik ÞT g ik ððdk ÞT r2 f ðxk Þdk Þdkþi j þ kg ik k2 ðdkþi r2 f ðxk Þðdk dkþi ÞÞdkþi j T i T i þjkg ik k2 ðdkþi r2 f ðxk Þdkþi Þðdk dkþi Þj þ kg ik k2 dkþi r2 f ðxk Þ Bd kþi dkþi dk ; where ckþi is given in Lemma 3.4. It is not difficult to get (3.24) in a way similar to derive (3.20). The next theorem plays an important role in the proof of the n-step quadratic convergence of the RMFR method. Theorem 3.2. Let Assumptions (A) and (B) hold and the sequence fxk g be generated by the RMFR method. Then we have i
kakrþi dkrþi aikr dkr k ¼ Oðkdkr k2 Þ;
i ¼ 0; 1; . . . ; jðkÞ 1:
ð3:25Þ
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390 i
kakrþi dkrþi aikr dkr k ¼ Oðkxkr x k2 Þ;
i ¼ 0; 1; . . . ; jðkÞ 1:
11387
ð3:26Þ
Proof. We first prove the following relations by induction in i.
kg kþi g ik k ¼ Oðkdk k2 Þ;
ð3:27Þ
i
kdkþi dk k ¼ Oðkdk k2 Þ;
ð3:28Þ
i
kakþi dkþi aik dk k ¼ Oðkdk k2 Þ:
ð3:29Þ g 0k ; dk
0 dk .
For i ¼ 0, the relations (3.27)–(3.29) follow from (3.24) and the fact g k ¼ ¼ Suppose that (3.27)–(3.29) hold for some i P 0. We are going to show they hold for i þ 1. From (3.23) and the induction assumption, we have i
2 2 i i kg kþiþ1 g iþ1 k k 6 kg kþi g k k þ Oðkdk k Þ þ Mðkakþi dkþi ak dk kÞ ¼ Oðkdk k Þ:
ð3:30Þ
We also have from (3.18) iþ1
kdkþiþ1 dk k ¼ Oðkdk k2 Þ:
ð3:31Þ
The Eqs. (3.24) and (3.30) imply iþ1
kakþiþ1 dkþiþ1 aiþ1 k d
iþ1
k ¼ Oðkg kþiþ1 g kiþ1 kÞ þ Oðkdkþiþ1 dk kÞ þ Oðkdk k2 Þ ¼ Oðkdk k2 Þ:
ð3:32Þ
The above process has shown (3.27)–(3.29) which implies (3.25). The relation (3.26) then follows from (3.25) and the fact kdkr k ¼ kg kr k 6 Mkxkr x k. Now we establish the n-step quadratic convergence of the RMFR method. h Theorem 3.3. Under the conditions of Assumptions A and B, the sequence fxk g generated by Algorithm 2 is n-step quadratically convergent. That is, there exists a constant c > 0, such that
lim sup
k!1
kxkrþn x k kxkr x k2
6 c < 1:
ð3:33Þ
Proof. Since the sequence ff ðxk Þg is decreasing and jðkrÞ 6 n, we have f ðxkrþn Þ f ðx Þ 6 f ðxkrþjðkrÞ Þ f ðx Þ. This together with (2.7) implies
kxkrþn x k 6 ðM=mÞðnjðkÞÞ=2 kxkrþjðkÞ x k:
ð3:34Þ
On the other hand, we have
jðkÞ1 jðkÞ1 jðkÞ1 X X X i i jðkÞ i i ¼ kxkrþjðkÞ xkr k ¼ ½ðxkrþiþ1 xkrþi Þ ðxiþ1 x Þ ½ a d a d kakrþi dkrþi aikr dkr k krþi krþi kr kr kr 6 kr i¼0 i¼0 i¼0 ¼ Oðkxkr x k2 Þ;
ð3:35Þ
where the last equality follows from (3.26). Therefore, we obtain
jðkrÞ kxkrþjðkÞ x k 6 xkrþjðkÞ x c fkr þ kx c fkr x k ¼ Oðkxkr x k2 Þ þ kxkr x k:
ð3:36Þ
jðkrÞ Since xkr is the exact minimizer of the function c fkr , it can be regarded as the iterate generated by a Newton step starting jðkrÞ from xkr . By the quadratic convergence of Newton’s method, it is not difficult to get kxkr x k ¼ Oðkxkr x k2 Þ. Consequently, the n-step quadratic convergence of fxk g follows from (3.36). h
Remark. The proof of Theorem (3.3) is similar to the proofs in [6,14]. The difference is that here the line search is inexact (see Appendix of [6,14]).
4. Numerical experiments In this section, we report some numerical experiments. We compare the performance of RMFR with that of the MFR method [2], the MPRP method [15] and the CG-DESCENT method [16]. The test problems are the unconstrained problems
11388
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
from Andrei [17]. All codes were written in Fortran and ran on IBM T60 PC with two 1.83 GHz CPU processors and 2.5 GB RAM memory. The CG-DESCENT codes can be obtained from Hager’s web page at http://www.math.ufl.edu/hager/papers/CG. As pointed out by Powell [10], for large scale problems, a restart strategy with r P n will become meaningless. It is suggested by Powell that using a relative small value of r in a restart conjugate gradient is preferable when applied to large scale problems. Following Powell’s suggestion, we try to find a better value of r for large scale problems through numerical experiments. To this end, we select 83 problems from [17] where 36 problems were from CUTE library. We set n ¼ 1000 for all problems to test the RMFR method with different values of r. Some r values are chosen as follow: r ¼ 10; 50; 100; 400; 800. The parameters in the RMFR method are as follows: We set q ¼ 0:5; d1 ¼ 104 ; d2 ¼ 0 and e ¼ 108 , we stop the iteration if the inequality kg k Þk 6 106 k or iteration number exceeds 5 104 . Fig. 1 shows the performance of the RMFR method to different r, which is evaluated using the profile of Dolan and Moré [18]. We can see that the rule r ¼ 50 seems to be the best when n is large. Keeping this in mind, we then compare the performance of RMFR method (r ¼ 50) with that of the MFR method [2], the MPRP method [15] and the CG-DESCENT method [16] according to the CPU time, the number of function evaluations and the number of gradient evaluations for all 83 test problems, respectively.
Fig: Performance profiles based on CPU times of diffirent restart number 1 r=10 r=50 r=100 r=400 r=800
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 5
10
15
20
25
30
35
40
Fig. 1. Performance profiles of RMFR with different values of r based on CPU time.
Fig: Performance profiles based on CPU times (N=1000) 1
cg descent mprp mfr rmfr
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2
4
6
8
10
12
Fig. 2. Performance profiles based on CPU time.
14
16
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
11389
Performance profiles based on the number of function evaluations (N=1000) 1
cg descent mprp mfr rmfr
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2
4
6
8
10
12
14
Fig. 3. Performance profiles for the number of function evaluations.
Performance profiles based on the number of gradient evaluations 1
cg descent mprp mfr rmfr
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2
4
6
8
10
12
14
Fig. 4. Performance profiles for the number of gradient evaluations.
Figs. 2–4 shows the performance of these methods in CPU time, the number of function evaluations and the number of gradient evaluations, respectively. Fig. 2 shows the CG-DESCENT method solves about 52% of the test problems with the least CPU time, but the RMFR method outperforms the others in probability of 2–8 times best time; Fig. 4 shows the MPRP method solves about 40% of the test problems with the least number of gradient evaluations, but the RMFR method outperforms the others in probability of 2–12 times least number; Fig. 3 shows the RMFR method is competitive to others in the number of function evaluations. References [1] R. Fletcher, C. Reeves, Function minimization by conjugate gradients, J. Comput. 7 (1964) 149–154. [2] L. Zhang, W.J. Zhou, D.H. Li, Global convergence of a modified Fletcher–Revves conjugate gradient method with Armijo-type line search, Numer. Math. 104 (2006) 561–572. [3] H.P. Crowder, P. Wolfe, Linear convergence of the conjugate gradient method, IBM J. Res. Develop. 16 (1972) 431–433. [4] M.J. Powell, Some convergence properties of the conjugate gradient method, Math. Program. 11 (1976) 42–49. [5] K. Ritter, On the rate of superlinear convergence of a class of variable metric methods, Numer. Math. 35 (1980) 293–313. [6] A. Cohen, Rate of convergence of several conjugate gradient algorithms, SIAM J. Numer. Anal. 9 (1972) 248–259.
11390 [7] [8] [9] [10] [11] [12] [13] [14] [15]
A. Qu et al. / Applied Mathematics and Computation 218 (2012) 11380–11390
W. Burmeister, Die Konvergenzordnung des Fletcher–Powell algorithmus, Z. Angew. Math. Mech. 53 (1973) 693–699. E.M.L. Beale, A Derivative of Conjugate Gradients, in: Numerical Methods for Nonlinear Optimization, Academic Press, London, 1972, pp. 39–43. M.F. McGuire, P. Wolfe, Evaluating a restart procedure for conjugate gradients, Report RC-4382, IBM Research Center, Yorktown Heights, 1973. M.J.D. Powell, Powell, Restart procedure of the conjugate gradient method, Math. Program. 2 (1977) 241–254. Y.H. Dai, Y.X. Yuan, Convergence properties of Beale–Powell restart algorithm, Sci. Chin. 41 (1998) 1142–1150. Y.H. Dai, L.Z. Liao, Duan Li, On restart procedure for the conjugate gradient method, Numer. Alg. 35 (2004) 249–260. D.H. Li, B.S. Tian, n-Step quadratic convergence of the MPRP method with a restart strategy, J. Comput. Appl. Math. 17 (2011) 4978–4990. Y.X. Yuan, W.Y. Sun, Optical Theory and Method, Science Press, 1997 (in Chinese). L. Zhang, W.J. Zhou, D.H. Li, A descent modified Polak–Ribiere–Polyak conjugate gradient method and its global convergence, IMA J. Numer. Anal. 26 (2006) 629–640. [16] W.W. Hager, H. Zhang, A new conjugate gradient method with guaranteed descent and an effcient line search, SIAM J. Optim. 16 (2005) 170–192. [17] N. Andrei, An uncontrained optimization test functions collection, AMO 10 (2008) 147–161. [18] E.D. Dolan, J. Moré, Benchmarking optimization software with performance profiles, Math. Program. 91 (2002) 201–213.