A compact limited memory method for large scale unconstrained optimization

European Journal of Operational Research 180 (2007) 48–56 www.elsevier.com/locate/ejor Continuous Optimization A compact limited memory method for l...

Download PDF

187KB Sizes 0 Downloads 60 Views

Report

PDF Reader
Full Text

European Journal of Operational Research 180 (2007) 48–56 www.elsevier.com/locate/ejor

Continuous Optimization

A compact limited memory method for large scale unconstrained optimization Yang Yueting

a,b,*

, Xu Chengxian

b

a

b

Faculty of Science, Beihua University, Jilin, Jilin 132013, PR China Faculty of Science, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, PR China Received 18 March 2004; accepted 16 February 2006 Available online 5 June 2006

Abstract A compact limited memory method for solving large scale unconstrained optimization problems is proposed. The compact representation of the quasi-Newton updating matrix is derived to the use in the form of limited memory update in which the vector yk is replaced by a modiﬁed vector ^y k so that more available information about the function can be employed to increase the accuracy of Hessian approximations. The global convergence of the proposed method is proved. Numerical tests on commonly used large scale test problems indicate that the proposed compact limited memory method is competitive and eﬃcient. 2006 Elsevier B.V. All rights reserved. Keywords: Large scale optimization; Nonlinear programming; Limited memory quasi-Newton method; BFGS update; Modiﬁed quasiNewton equation

1. Introduction We consider the unconstrained optimization problem min

f ðxÞ;

ð1:1Þ

where f : Rn ! R is a nonlinear continuously diﬀerentiable function, and the number of variables n is large. Large scale optimization is one of important research areas in optimization theory and algorithm designs. There are some kinds of algorithms available for solving large scale optimization problems, for example, limited memory quasi-Newton methods, conjugate gradient methods and sparse matrix techniques. Let x0 be a starting point, and H0 an initial symmetric positive deﬁnite matrix. A normal quasi-Newton method generates sequences {xk} and {Hk} by the iteration xkþ1 ¼ xk þ ak d k

*

Corresponding author. Address: Faculty of Science, Beihua University, Jilin, Jilin 132013, PR China. E-mail addresses: [email protected] (Y. Yueting), [email protected] (X. Chengxian).

0377-2217/$ - see front matter 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2006.02.045

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56

49

and a quasi-Newton updating formula for Hk, where dk is a search direction generated by d k ¼ H k gk and ak is a step length along the direction, gk = $f(xk) the gradient of f(x) at xk and Hk an inverse approximation to the Hessian Gk = $2f(xk) of f(x). The updating matrix Hk is usually required to satisfy the quasiNewton equation H kþ1 y k ¼ sk ; where sk = xk+1 xk, yk = gk+1 gk. Due to the denseness of quasi-Newton updating matrices, general quasi-Newton updates are not suitable for large scale optimization. In recent three decades, attempts have been made to generalize normal quasiNewton updates to the limited memory methods and many approaches were proposed in which the main properties of quasi-Newton updates are maintained, and the amount of storage required by algorithms can be controlled by the user. Since the BFGS update is widely used to solve general nonlinear minimization problems, the most of studies on limited memory methods are concentrate on the limited memory BFGS (L-BFGS) method (see [1,3,6– 8,16]). A drawback of BFGS update is that this update exploits only the gradient information while the available information in function values is neglected. Some eﬃcient attempts have been made to modify the usual quasi-Newton methods to use both the gradient and function value information (see [4,9,14,15]). Recently, a modiﬁcation to quasi-Newton equation is proposed to include not only the gradient information, but also the function value information by Zhang and Xu in [16] and Xu and Zhang in [13]. The modiﬁed quasi-Newton equation has the form H kþ1 ^y k ¼ sk ; ð1:2Þ where ^y k ¼ 1 þ sThky y k and hk = 6(f(xk) f(xk+1)) + 3(gk + gk+1)Tsk. It is proved in [16] that the modiﬁed k k quasi-Newton equation achieves a higher order accuracy in approximating the second-order curvature of the objective function. Let kk ¼ 1 þ sThky , then ^y k ¼ kk y k . When the BFGS update satisﬁes the modiﬁed k k

quasi-Newton equation (1.2), the modiﬁed BFGS (MBFGS) update is simpliﬁed by T ^y k sT ^y k sT sk sT 1 H kþ1 ¼ I T k H k I T k þ T k ¼ V Tk H k V k þ qk sk sTk ; kk sk ^y k sk ^y k sk ^y k

ð1:3Þ

where qk ¼

1 y Tk sk

;

V k ¼ I qk y k sTk :

ð1:4Þ

The MBFGS update diﬀers from the normal BFGS update in the last term where 1/kk is multiplied to the term qk sk sTk . But a higher order approximation along the step direction can be obtained. Meanwhile, for twice continuously diﬀerentiable function, if {xk} converges to a point x* at which g(x*) = 0 and $2f(x*) is positive definite, then limk!1hk = 0, and hence limk!1kk = 1. Thereby the heredity of positive-deﬁnite updates can be kept by the MBFGS update for suﬃciently large k. Furthermore, other properties of the modiﬁed quasi-Newton methods, especially the global and the super-linear convergence properties, are analyzed in [16,13], respectively. Numerical experiments in [16] also show that the MBFGS update is more competitive than the BFGS update and the other quasi-Newton updates. In papers [1,3], authors described a compact form and a two loop recursion of the limited memory method respectively. As indicated in [1], the compact form is more eﬃcient to adapt for sparse cases. Moreover, the compact form has an analog for the direct update, but the two loop recursion has not. These motivate us to derive the compact form of the L-MBFGS that is expected to provide a new model for solving all kinds of large scale optimization. The rest of the paper is organized as follows. In Section 2, we present the compact representation of the limited memory MBFGS update. The limited memory MBFGS algorithm with compact formula is described. In Section 3, the global convergence result on twice continuously diﬀerentiable and convex functions, and the

50

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56

R-linear convergence rate are proved. In the last section, numerical results and comparisons with the normal limited memory BFGS method on large scale test problems are reported and show the advantages of the proposed limited memory method over the usual ones. 2. The compact limited memory MBFGS algorithm In this section, the compact expression of the MBFGS update will be derived and a limited memory quasiNewton algorithm will be described correspondingly. be a given small positive integer and m ¼ minfk; mg. Suppose that m pairs fsi ; y i gk1 Let m i¼km and a basic ð0Þ matrix H k (usually a diagonal matrix) are available at the current iterate xk. By using the MBFGS formula ð0Þ k1 and the pairs of fsi ; y i gi¼km to update H k m times, the matrix Hk is generated and will be used in computing the search direction dk. From (1.3), Hk can be written as 1 q ðV T V Tkmþ1 Þskm sTkm ðV kmþ1 V k1 Þ kkm km k1 1 1 q ðV T V Tkmþ2 Þskmþ1 sTkmþ1 ðV kmþ2 V k1 Þ þ þ q sk1 sTk1 : þ kkmþ1 kmþ1 k1 kk1 k1 ð0Þ

H k ¼ ðV Tk1 V Tkm ÞH k ðV km V k1 Þ þ

ð2:1Þ

According to the above recursive formula, a compact representation of updating matrix Hk can be derived. k1 For convenience in the use of notation, it is assumed that k pairs of vectors fsi ; y i gi¼0 are stored. Deﬁne the n · k matrices Sk and Yk as S k ¼ ½s0 ; . . . ; sk1 ;

Y k ¼ ½y 0 ; . . . ; y k1 :

k1 fsi ; y i gi¼0

satisfy sTi y i > 0. Let Hk be obtained by updating H0 k times using the BFGS forAssume that mula. In Ref. [1], authors derive the following compact representation of the L-BFGS update by induction: # " #" T 1 RT RT S Tk k ðDk þ Y k H 0 Y k ÞRk k H k ¼ H 0 þ ½S k ; H 0 Y k ; ð2:2Þ R1 0 Y Tk H 0 k where Rk is the following m · m nonsingular matrix: T si1 y j1 ; if i 6 j ðRk Þi;j ¼ 0; otherwise

ð2:3Þ

and Dk is a diagonal matrix Dk ¼ diag½sT0 y 0 ; . . . ; sTk1 y k1 :

ð2:4Þ

In fact, the normal BFGS update can be written as Hk ¼ Mk þ Nk

k P 1;

where Mk and Nk are deﬁned recursively by M 0 ¼ H 0; M kþ1 ¼ V Tk M k V k

ð2:5Þ

ð2:6Þ

and (

N 1 ¼ q0 s0 sT0 ; N kþ1 ¼ V Tk N k V k þ qk sk sTk :

ð2:7Þ

Then, the compact representation T 1 T M k ¼ ðI S k RT k Y k ÞH 0 ðI Y k Rk S k Þ;

Nk ¼

1 T S k RT k Dk Rk S k

can be obtained by induction for k, respectively.

ð2:8Þ ð2:9Þ

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56

51

For the MBFGS update (1.3), its only diﬀerence from the BFGS update is the second term on the right side of (1.3). So Hk in (1.3) can also be written as (2.5) where Mk can be deﬁned as well as (2.6), but Nk should be deﬁned by the below recursion ( N 1 ¼ qk00 s0 sT0 ; ð2:10Þ N kþ1 ¼ V Tk N k V k þ qkkk sk sTk : Except for multiplying 1/kk to qk sk sTk in turn, by repeating the proving process of (2.2) in [1], the formula such as (2.9) can be obtained as long as we redeﬁne Dk as T 1 1 s y sT y Dk ¼ diag ;...; ¼ diag 0 0 ; . . . ; k1 k1 : ð2:11Þ k0 q0 kk1 qk1 k0 kk1 Whereupon, the compact representation of the MBFGS can be expressed by (2.2), where Rk and Dk are deﬁned as (2.3) and (2.11), respectively. Based on the compact form (2.2) with (2.3) and (2.11), we can describe the limited memory implementation of the MBFGS method for the large scale unconstrained optimization. ð0Þ Now we reconsider the formula (2.1). Suppose H k ¼ ck I with some positive scalar ck at the iterate xk. And ð0Þ take the place of H0 by H k . Then the compact form (2.2) corresponding (2.1) can be rewritten as follows: # " #" T 1 RT RT S Tk k ðDk þ ck Y k Y k ÞRk k H k ¼ ck I þ ½S k ; ck Y k ; ð2:12Þ R1 0 ck Y Tk k where now S k ¼ ½skm ; . . . ; sk1 ;

Y k ¼ ½y km ; . . . ; y k1

and where Rk and Dk are, respectively, described as ( ðskmþi1 ÞT y kmþj1 if i 6 j; ðRk Þi;j ¼ 0 otherwise; T T s y s y Dk ¼ diag km km ; . . . ; k1 k1 : kkm kk1

ð2:13Þ

ð2:14Þ ð2:15Þ

Once the new iterate xk+1 is generated, the matrix Sk+1 should be refreshed. If m < k, the matrix Sk+1 is obtained by removing skm from Sk and adding sk at the last column. For the case of m = k, Sk+1 is obtained by adding sk at the last column of Sk directly. The matrices Yk+1, Rk+1 and Dk+1 are updated in the same fashion. Algorithm 1 (L-MBFGS method). Step 1: Give parameter values m, 0 < q < d < 1, > 0 and s > 0; Choose an initial point x0, an initial value c0 > 0, and set k = 0. Step 2: Compute a search direction dk; If k = 0, then dk = gk; else go to Algorithm 2 to calculate dk = Hkgk. Step 3: Perform a line search to determine a step length ak and set xk+1 = xk + akdk. Step 4: Termination test: If jf(xk+1) f(xk)j 6 max{1.0, jf(xk)j} or kg(xk+1)k 6 s max{1, kxk+1k} then stop; else set k = k + 1, and go to Step 2. In Step 3, the Wolfe conditions f ðxk þ ak d k Þ 6 f ðxk Þ þ qak gTk d k ; T

gðxk þ ak d k Þ d k P

dgTk d k

are used to determine the step length ak. We always try the step length ak = 1 ﬁrst.

ð2:16Þ ð2:17Þ

52

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56 yT s

In step 2, when Hk is updated by using (2.12), the value of ck is given by c0 = 1 and ck ¼ y Tk1yk1 for k P 1. k1 k1 The search direction dk is calculated by the following algorithm in the case of k P 1. Algorithm 2 (Calculate the search direction at k P 1). let Sk = [skm, . . . , sk1] and Yk = [ykm, . . . , yk1]. Step 2:1: Set m ¼ minfk; mg, Step 2:2: Calculate gTk gk , S Tk gk and Y Tk gk , and update Rk, Y Tk Y k and Dk. T Step 2:3: Calculate ck, zk ¼ ½S Tk gk ck Y Tk gk and " # T 1 RT RT k ðDk þ ck Y k Y k ÞRk k pk ¼ zk : R1 0 k Step 2:4: Calculate Hkgk = ckgk + [Sk ckYk]pk and set dk = Hkgk. It is easy to know that the L-MBFGS update preserves positive deﬁnite updates if and only if sTk ^y k > 0. Note that the conditions (2.16) and (2.17) assure sTk y k > 0 in practical calculations. In paper [16], the authors indicate that sTk ^y k > 0 for suﬃciently large k when the sequence {xk}, generated by MBFGS method, converges to a strong local minimizer x* of f(x). They also pointed out that, if the iterate is far away from the local solution, the case sTk ^y k 6 0 may occur though it happens very rare. So one safeguard strategy hk ¼ ð 1ÞsTk y k

if hk < ð 1ÞsTk y k

ð2:18Þ

is used in practice to restrict the value of hk. In substance, the strategy (2.18) assures that the updating matrices are positive deﬁnite and makes the properties of the BFGS method be retained. Thus the search direction dk is a descending direction and the sequence {xk} is convergent. Furthermore, since the computational costs for fki gk1 i¼km are trivial for large dimension n, the proposed LMBFGS method does not cause more computational costs than the usual L-BFGS method. 3. Convergence of the algorithm This section is devoted to show that the proposed algorithm is convergent on twice continuously diﬀerentiable and uniformly convex functions, and its convergence rate is R-linear. With such functions, the level set L(x0) = {xjf(x) 6 f(x0)} is a bounded closed convex set for any given x0 and f(x) has an unique minimizer x* in L(x0), also there exist constants M2 > M1 P 0 such that the following inequalities: T

M1 6

ðgkþ1 gk Þ ðxkþ1 xk Þ kxkþ1 xk k2

6 M 2;

M1 6

kgkþ1 gk k T

2

ðgkþ1 gk Þ ðxkþ1 xk Þ

6 M2

ð3:1Þ

holds for any xk+1, xk 2 L(x0) with xk+1 5 xk. The below lemma are required in the proof of the main convergence result. Lemma 3.1. [12] Assume $f(x) exists and is uniformly continuous. Let angle nk between the search direction dk and $f(xk) satisfy p 8k ð3:2Þ nk 6 l 2 p

2 0; 2 is a constant. and the step length ak be determined by the Wolfe conditions (2.16) and (2.17), where l Then the sequence {xk} generated by the iteration xk+1 = xk + akdk either terminates in a finite number of iterations or has one of the following two cases: (i) f(xk) ! 1 (k ! 1); (ii) $f(xk) ! 0 (k ! 1). With the inequalities (3.1) and Lemma 3.1, we can derive the following convergence conclusion of Algorithm 1.

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56

53

Theorem 3.1. Suppose that f(x) is twice continuously differentiable and uniformly convex. Then the sequence {xk} generated by Algorithm 1 converges to the unique minimizer x* of f(x) on L(x0) for any initial point x0 2 D, and the convergence rate is R-linear, that is, there is a constant 0 6 r < 1 such that f ðxk Þ f ðx Þ 6 rk ðf ðx0 Þ f ðx ÞÞ:

ð3:3Þ

Proof. Without lose of generality, we assume that an inﬁnite sequence {xk} is generated by Algorithm 1. According to (3.1) and Lemma 3.1, we only need to prove the conclusion $f(xk) ! 0 and obtain the conver 2 ð0; p=2 that satisﬁes (3.2) for gence rate. In order to get the former, we must derive that there is a constant l all k P 0. To do so, we estimate upper bounds for kHkk and the trace of Bk ¼ H 1 which is the Hessian k approximation of f(x), respectively. Because j2ðf ðxk Þ f ðxkþ1 Þ þ gTk sk Þj ¼ sTk r2 f ðxk þ tsk Þsk for t 2 (0, 1), we have T

T

jhk j ¼ j6ðf ðxk Þ f ðxkþ1 ÞÞ þ 3ðgk þ gkþ1 Þ sk j 6 3jsTk r2 f ðxk þ tsk Þsk þ ðgkþ1 gk Þ sk j: Then using the ﬁrst formula of (3.1), we obtain jhk j 6 6M 2 ksk k2 : Note that the strategy (2.18) implies hk P ð 1ÞsTk y k . Conjugating with (3.1) and the above inequality, it holds that hk 6M 2 6 jkk j ¼ 1 þ T 6 1 þ : ð3:4Þ sk y k M1 From Ref. [3], we know kV k k ¼ kI

qk y k sTk k

rﬃﬃﬃﬃﬃﬃﬃ M2 : 61þ M1

ð3:5Þ

Moreover, it is clear that jck j ¼ jy Tk1 sk1 =y Tk1 y k1 j 6

1 M1

holds from the second formula of (3.1). Using (3.4)–(3.6), we obtain rﬃﬃﬃﬃﬃﬃﬃ2m 1 M2 1 kH kþ1 k 6 1þ ðm þ 1Þ þ 1 M M1 M1 qﬃﬃﬃﬃﬃ2m

M2 1 for any k P 0, where M ¼ M 1 1 þ M 1 ðm þ 1Þ 1 þ 1 > 0 is a constant. ð0Þ

ð3:6Þ

ð3:7Þ

ð0Þ

Now let Bk ¼ ðH k Þ1 , by using the similar proof in [3] and the second inequality of (3.4), it is proved that 6M 2 ðjÞ ðj1Þ TrðBk Þ 6 TrðBk Þ þ 1 þ M 2 ; j ¼ 1; 2; . . . ; m: M1 ð0Þ

Since kBk k 6 M 2 from the deﬁnition of ck, the following inequality holds: 6M 2 ðmÞ TrðBkþ1 Þ ¼ TrðBk Þ 6 M 2 þ m 1 þ M 2: M1 The rest proof of this theorem can be obtained by using the method as well as that of [3]. Consequently, 2 ð0; p=2 such that nk 6 p2 l holds for all k. And it can be proved that there exists l f ðxk Þ f ðx Þ 6 rk ðf ðx0 Þ f ðx Þ; ð3:8Þ , c is relevant to M1, b and parameters q, d of Wolfe conditions. The proof is where r = 1 cb2, b ¼ cos p2 l completed. h

54

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56

4. Numerical results This section is devoted to test the implementation of the L-MBFGS method. We compare the performance of the L-MBFGS method with the L-BFGS method. All the experiments in this paper are implemented on a PC with 1.8 MHz Pentium IV and 256 MB SDRAM using MATLAB 6.1. In all the tests, the values of q = 0.01 and d = 0.9 in conditions (2.16) and (2.17) are used to determine the step length ak. The value = s = 108 is used in termination tests of Step 4. Some classical test functions with standard starting points are selected to test the two limited memory methods. These functions are widely used in literature to test unconstrained optimization algorithms (see [2,5,10,11]). The dimensions of these problems are variable according to the demands in testing. In the implementation of the proposed algorithm, m = 3 is selected, that is, the most recent 3 correction k1 pairs fsi ; y i gi¼k3 are used to get the matrix Hk at each iteration. Table 1 gives the numerical results for 12 test functions with dimensions from 500 to 2000. In the table, the four reported data (k/nf/ng/t) are iteration numbers/function evaluations/gradient evaluations/CPU times, and f* stands for the function value at the ﬁnal iterate. It can be observed from Table 1 that the implementation of L-MBFGS method is superior to the L-BFGS from the iteration numbers, the calls of function and gradient evaluations as well as calculation time. For the most of problems, the ﬁnal function values obtained by the L-MBFGS method are better than those obtained by the L-BFGS method, and the CPU times of the L-MBFGS method for all the problems are less than those of the L-BFGS method. Note that the cases sTk ^y k < 0 occur and the strategy (2.18) is used for 4 problems in Table 1. Tracking their iteration process, we ﬁnd that the case sTk ^y k < 0 occurs at the ﬁrst 3 iterations. This implies that the point, which causes sTk ^y k < 0, is generally far away from the local solution. Table 2 presents the numerical results on some test problems with dimensions from 3000 to 30,000. It can be observed that the L-MBFGS method is clearly better than the L-BFGS method.

Table 1 Numerical results 1 No.

Prob.

dim

1

probpenl.mod

2

Separable cubic

1000

3

Schittkowski function 302a

1000

4

Extended Rosebrock

1000

5

Extended Powell singulara

1000

6

Penalty functiona

1000

7

Variable dimensiona

1000

8

Boundary value

1000

9

Nearly separable

1000

10

Allgower

1000

11

Chebyquad

1000

12

edensch.mod

2000

a

Denotes that the strategy (2.18) is implemented.

500

k/nf/ng/t (L-BFGS)

k/nf/ng/t (L-MBFGS)

fa

fa

10/28/12/0.561000 1.6269 · 107 10/12/11/88.38700 1.0325 · 1011 54/73/57/4.41700 2.6726 · 104 32/48/36/2.94400 1.4173 · 1013 38/57/40/1.52200 2.0416 · 108 89/139/106/5.21700 0.0096861754544 46/117/48/2.273000 9.2076 · 104 648/690/650/46.13600 2.9809 · 104 87/106/89/549.240000 1.7652 · 106 13/58/15/60.66700 2.1090 · 103 105/147/113/485200 0.0078884725716 20/28/24/2.394000 1.2003 · 104

8/26/10/0.491000 2.7590 · 107 9/12/10/70.24100 5.1395 · 1012 43/61/44/3.23500 3.6865 · 104 31/44/33/2.69400 2.4079 · 1014 35/52/35/1.45200 2.9827 · 109 46/70/51/3.5500 0.00968617655283 11/171/13/1.76300 6.7292 · 1018 642/680/644/37.67400 3.1479 · 104 43/85/51/378.043000 1.5836 · 107 11/61/13/58.565000 2.1090 · 103 65/156/67/395500 0.00788706650742 19/28/22/2.374000 1.2003 · 104

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56

55

Table 2 Numerical results 2 No.

Prob.

dim

2

Separable cubic

10,000

3

Schittkowski function 302

10,000

4

Extended Rosebrock

10,000

5

Extended Powell singular

10,000

6

Penalty function

10,000

7

Variable dimension

10,000

13

bdexp.mod

10,000

14

cosine.mod

10,000

15

engval1.mod

5000

16

freuroth.mod

5000

17

dixmaana.mod

3000

18

dixmaanb.mod

6000

19

dixmaanc.mod

9000

20

dixmaanad.mod

a

30,000

k/nf/ng/t (L-BFGS)

k/nf/ng/t (L-MBFGS)

fa

fa

8/14/10/6337.100 3.2902 · 109 189/332/191/52.665000 1.8813 · 108 32/48/36/10.15400 1.4166 · 1012 41/61/42/13.8900 8.5689 · 108 65/87/67/19.45800 0.099001511953 58/148/64/36.04100 6.1326 · 105 11/13/13/6.8700 0.0168 16/60/18/6.510000 9.9975 · 103 21/39/23/3.635000 5548.66841916 24/69/26/16.65300 608159.18924 7/11/10/1.04100 1.000 0 000 0 000 0 00261 7/11/10/3.42500 1.000 0 000 0 000 0 00037 56/79/63/35.43100 1.000 0 000 0 000 0 02393 41/57/46/81.74800 1.000 0 000 0 000 0 01272

7/12/10/6230.100 1.3445 · 1012 77/252/79/35.83100 4.2642 · 1010 31/45/36/9.51300 5.0095 · 1019 35/53/36/11.51700 3.6109 · 107 46/71/49/14.94200 0.099001511952 47/144/49/28.51100 1.5201 · 106 7/9/9/4.907000 0.0088 14/57/15/6.647000 9.9990 · 103 20/41/22/3.565000 5548.66841914 19/59/21/13.7700 608159.18904 5/12/8/0.94100 1.000 0 000 0 000 0 00801 6/13/8/3.32500 1.000 0 000 0 000 0 00029 8/14/10/5.818000 1.000 0 000 0 000 0 01344 37/54/41/76.05900 1.000 0 000 0 000 0 00051

Denotes that the strategy (2.18).

A close examination on diﬀerent values of the parameter m (m = 3, 5, 7 and 11) has been done to have a good choice for the value of m. For some problems with dim = 1000, larger m (such as 5, 7, 11, . . .) can make algorithms more eﬃcient and generate the numerical results more accurate. But for problems with larger dimension, for instance dim = 5000 or 10,000 etc., if larger m is used, it will lead to too expensive EMS memory. From the experiments and the consideration in EMS memory, m = 3 is recommended. The theoretical analysis shows that the proposed L-MBFGS method preserves the properties of the L-BFGS method, but outperforms the L-BFGS method. Therefore, the L-MBFGS method is worth recommended, and further considered. Acknowledgement This work is supported by National Natural Key product Foundations of China 10231060. References [1] R.H. Byrd, J. Nocedal, R.B. Schnabel, Representations of quasi-Newton matrices and their use in limited memory methods, Technical Report NAM-03, Department of Electrical Engineering and Computer Science, Northwestern University, 1992. [2] Hande Yurttan Benson, Cuter models. Available from: . [3] D.C. Liu, J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical Programming 45 (1989) 503– 528. [4] W.D. Lin, C.X. Xu, Global convergence properties of convex Broyden quasi-Newton methods based on the new quasi-Newton equations, in: Presented at the International Conference on Nonlinear Programming and Variational Inequalities, City University of Hong Kong, December, 1998.

56

Y. Yueting, X. Chengxian / European Journal of Operational Research 180 (2007) 48–56

[5] J.J. More, B.S. Garbow, K.E. Hillstrom, Testing unconstrained optimization software, ACM Transactions on Mathematical Software 7 (1981) 17–41. [6] J.L. Morales, J. Nocedal, Automatic preconditioning by limited memory quasi-Newton updating, SIAM Journal on Optimization 10 (1996) 1079–1096. [7] L. Nazareth, A relationship between BFGS and conjugate gradient algorithms and its implications for new algorithms, SIAM Journal on Numerical Analysis 16 (1979) 794–800. [8] J. Nocedal, Updating quasi-Newton matrices with limited storage, Mathematics of Computation 35 (1980) 773–782. [9] R.B. Schnabel, T. Chow, Tensor methods for unconstrained optimization using second derivatives, SIAM Journal on Optimization 1 (1991) 293–315. [10] Ph.L. Toint, Some numerical results using a sparse matrix updating formula in unconstrained optimization, Mathematics of Computation 32 (1978) 839–851. [11] Ph.L. Toint, Test problems for partially separable optimization and results for the routine PSPMIN, Report Nn8314, Department of Mathematics, Faculte’s Universitaries de Namur, Namur, Belgium, 1983. [12] Ch.X. Xu, Z.P. Chen, N.C. Li, Advanced Optimization Algorithms, Scientiﬁc Publisher, Beijing, 2002 (in Chinese). [13] Ch.X. Xu, J.Z. Zhang, A survey of quasi-Newton equations and quasi-Newton methods for optimization, Annals of Operations Research 103 (2001) 213–234. [14] Y. Yuan, R. Byrd, Non-quasi-Newton updates for unconstrained optimization, Journal of Computation Mathematics 13 (1995) 95– 107. [15] J.Z. Zhang, N.Y. Deng, L.H. Chen, New quasi-Newton equation and related methods for unconstrained optimization, Journal of Optimization Theory and Application 102 (1999) 147–167. [16] J.Z. Zhang, Ch.X. Xu, Properties and numerical performance of quasi-Newton methods with modiﬁed quasi-Newton equations, Journal of Computational and Applied Mathematics 137 (2001) 269–278.

A compact limited memory method for large scale unconstrained optimization

A compact limited memory method for large scale unconstrained optimization

Recommend Documents