The convergence properties of RMIL+ conjugate gradient method under the strong Wolfe line search

The convergence properties of RMIL+ conjugate gradient method under the strong Wolfe line search

Applied Mathematics and Computation 367 (2020) 124777 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homepage...

449KB Sizes 41 Downloads 115 Views

Applied Mathematics and Computation 367 (2020) 124777

Contents lists available at ScienceDirect

Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

The convergence properties of RMIL+ conjugate gradient method under the strong Wolfe line search Osman Omer Osman Yousif Department of Mathematics, Faculty of Mathematical and Computer Sciences, University of Gezira, Sudan

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 30 September 2018 Revised 16 September 2019 Accepted 23 September 2019

In Dai (2016), based on the global convergence of RMIL conjugate gradient method, Dai has modified it and called the modified version RMIL+ which has good numerical results and globally convergent under the exact line search. In this paper, we established the sufficient descent property and the global convergence of RMIL+ via strong Wolfe line search method. Moreover, numerical results based on well-known optimization problems show that the modified method is competitive when compare with other conjugate gradient methods. © 2019 Elsevier Inc. All rights reserved.

Keywords: Conjugate gradient method Strong Wolfe line search Sufficient descent property Global convergence

1. Introduction Due to their simplicity and low memory requirement, conjugate gradient methods are widely used to solve the unconstrained optimization problem

minn f (x ),

(1.1)

x∈R

where f: Rn → R is a continuously differentiable function whose gradient is denoted by g. To solve (1.1), a conjugate gradient method uses the following iterative formula:

xk+1 = xk + αk dk , k = 0, 1, 2, . . . ,

(1.2)

where the step length α k is positive, and usually computed using different line search methods. The search direction dk is computed as follows:

 dk =

−gk , −gk +bk dk−1 ,

if k= 0, if k ≥ 1,

(1.3)

where β k is a coefficient whose different forms determine different conjugate gradient methods. Well known formulas for β k are the Hestenes–Stiefel (HS) [12], Fletcher–Reeves (FR) [16], Polak–Ribière–Polyak (PRP) [1,4], Conjugate Descent (CD) [15], Liu–Storey (LS) [18], and Dai–Yuan (DY) [19] formulas.

E-mail address: [email protected] https://doi.org/10.1016/j.amc.2019.124777 0 096-30 03/© 2019 Elsevier Inc. All rights reserved.

2

O.O.O. Yousif / Applied Mathematics and Computation 367 (2020) 124777

To establish the convergence results of conjugate gradient methods, the step length α k can be computed using the strong Wolfe line search, in which α k satisfies the following two conditions:





f (xk + αk dk ) ≤ f (xk ) + δαk gTk dk gTk+1 dk |≤ σ |gTk dk ,

(1.5)

where 0 < δ < σ < 1 and gk + 1 = g(xk + α k dk ). Also, the sufficient descent property, namely,

gTk dk ≤ −C gk 2 ,

∀ k ≥ 0 and a constant C > 0,

(1.6)

is often needs to establish the global convergence. The global convergence properties of the conjugate gradient methods have been studied by many researchers. These including Al-Baali [10], Powell [11], Zoutendijk [6], Gilbert and Nocedal [7], Hu-Storey [17], Liu et al. [5] and Touati-Ahmed and Storey [2] among others. Recently, based on Rivaie et al. [13], Dai [20] has proposed RMIL+ conjugate gradient method whose coefficient β k defined by

⎧ T   ⎨ gk (gk − gk−1 ) , if 0 ≤ gTk gk−1  ≤ gk 2 , RMIL+ 2 βk = ⎩ dk−1 

(1.7)

0, otherwise,

and has shown that RMIL+ is globally convergent and gives good numerical results when it is applied under the exact line search. In this paper, under some assumptions, the sufficient decent property and the global convergence of RMIL+ are established using the strong Wolfe line search in the next section. To show the efficiency of RMIL+ method under the strong Wolfe line search in practice, a numerical experiment along with discussions are given in Section 3. Finally in Section 4, a conclusion is given. 2. Convergence analysis In this section, we will present the proofs of the sufficient descent property (1.6) and the global convergence of RMIL+ when it is applied under the strong Wolfe line search. These proofs are based on the following two inequalities

0 ≤ βkRMIL+ ≤

g k 2 , ∀ k ≥ 1, dk−1 2

(2.1)

and

g k  < 2, ∀ k ≥ 0. d k 

(2.2)

Inequality (2.1) is obvious from the definition (1.7). Now under some conditions, we show that inequality (2.2) is hold true. But before that we note that for each real number σ

1 ⇒ 4σ − 2 < −1 4 −1<0

0<σ < ⇒ 2σ

(2.3)

⇒ 2σ − 1 < 0. Also

1 ⇒ 2(2σ − 1 ) < −1 4 −1 ⇒ 2σ − 1 < 2 1 ⇒ 1 − 2σ > 2 1 ⇒ < 2. 1 − 2σ 0<σ <

(2.4)

Theorem 2.1. Let the sequences {gk } and {dk } be generated by RMIL+ conjugate gradient method when it is applied under the strong Wolfe line search with 0 < σ < 14 . Then inequality (2.2) holds. Proof. The proof is by induction. From (1.3), the result is true for k = 0. Now suppose that (2.2) is true for some k ≥ 0. Rewriting Eq. (1.3) for k + 1, we have + gk+1 = −dk+1 + βkRMIL dk . +1

(2.5)

O.O.O. Yousif / Applied Mathematics and Computation 367 (2020) 124777

3

Multiplying (2.5) by gTk+1 , we get + T gk+1 dk . gk+1 2 = −gTk+1 dk+1 + βkRMIL +1

Using the triangle inequality

  + gk+1 2 ≤ gTk+1 dk+1 |+|βkRMIL | |gTk+1 dk . +1

Applying the strong Wolfe condition (1.5), it follows that

  + gk+1 2 ≤ gTk+1 dk+1 |+σ |βkRMIL | |gTk dk . +1

(2.6)

Substituting (2.1) into (2.6) and using Cauchy–Schwartz inequality, we get

gk+1 2 ≤ gk+1 dk+1  + σ gk+1 2

g k  . d k 

(2.7)

Dividing the both sides of (2.7) by gk + 1 , we get

gk+1  ≤ dk+1  + σ gk+1 

g k  . d k 

Applying the induction hypothesis (2.2)

gk+1  < dk+1  + 2σ gk+1 . Therefore,

gk+1 (1 − 2σ ) < dk+1 . Since 1 − 2σ > 0(see (2.3)) and from (2.4), we come to

1 gk+1  < < 2. dk+1  1 − 2σ Hence (2.2) is true for k + 1 and this completes the proof.



From (2.2), by squaring the both sides, we come to

1

d k 

2

<

4

g k 2

,

∀ k ≥ 0,

(2.8)

when RMIL+ conjugate gradient method is applied under the strong Wolfe line search with 0 < σ < 14 . The following theorem establishes the sufficient descent property and will be used to prove the global convergence. Theorem 2.2. Let the sequences {gk } and {dk } be generated by RMIL+ conjugate gradient method under the strong Wolfe line search with 0 < σ < 14 . Then

−1 − 2σ <

gTk dk

 g k 2

< −1 + 2σ ,

∀ k ≥ 0.

(2.9)

Hence, the sufficient descent property (1.6) holds. Proof. From (1.3), the result is obvious for k = 0. Consider k > 0. From (1.3), we have

gTk dk = −gk 2 + βkRMIL+ gTk dk−1 . From strong Wolfe condition (1.5) and (2.1)





(2.10)





−σ βkRMIL+ gTk−1 dk−1  ≤ βkRMIL+ gTk dk−1 ≤ σ βkRMIL+ gTk−1 dk−1  Applying (2.10) and (2.11) together





(2.11)





−gk 2 − σ βkRMIL+ gTk−1 dk−1  ≤ gTk dk ≤ −gk 2 + σ βkRMIL+ gTk−1 dk−1 . Using Cauchy Schwartz inequality

−gk 2 − σ βkRMIL+ gk−1 dk−1  ≤ gTk dk ≤ −gk 2 + σ βkRMIL+ gk−1 dk−1  Applying (2.1) and (2.12) together and then dividing both sides by gk

−1 − σ

gT dk gk−1  g  ≤ k 2 ≤ −1 + σ k−1 . dk−1  gk  dk−1 

(2.12)

2 (2.13)

4

O.O.O. Yousif / Applied Mathematics and Computation 367 (2020) 124777

Applying Theorem 2.1

gTk dk

−1 − 2σ <

g k 2

< −1 + 2σ .

(2.14)



This completes the proof.

To prove the global convergence, we assume that the objective function f satisfies the following Assumption 2.1. (i) The level set  = {x ∈ Rn : f(x) ≤ f(x0 )} is bounded, where x0 is the starting point. (ii) In some neighborhood N of , the objective function is continuously differentiable, and its gradient is Lipschitz continuous, namely, there exists a constant L > 0 such that g(x) − g(y) ≤ L x − y, ∀ x, y ∈ N. We need the following lemma, which was proved by Zoutendijk [6]. Lemma 2.1. Suppose that Assumption 2.1 holds. Consider any conjugate gradient method of the form (1.2) and (1.3) where the step length α k is computed by the strong Wolfe line search. Then, the following condition, known as the Zoutendijk condition will be satisfied ∞ 

gk 2 cos2 θk < ∞,

(2.15)

k=0

where θ k is the angle between dk and the steepest descent direction − gk which is given by

cos θk =

−gTk dk

gk  |dk |

.

(2.16)

Theorem 2.3. Suppose that Assumption 2.1 holds. Let the sequences {gk } and {dk } be generated by RMIL+ conjugate gradient method under the strong Wolfe line search with 0 < σ < 14 . Then ∞  g k 4 k=0

 d k 2

< ∞.

(2.17) g 

Proof. Multiplying (2.9) by − dk  and using (2.16), we get k

c2

g k  g  < cos θk < c1 k , d k  d k 

(2.18)

where c1 = 1 + 2σ and c2 = 1 − 2σ . Since c2 > 0 when 0 < σ <

c22

1 4

(see (2.3)), then cos θ k > 0. Hence,

g k 2 < cos2 θk .  d k 2

This implies

c22

∞   g k 4 k=0 dk 

2

<

∞ 

gk 2 cos2 θk .

(2.19)

k=0

From (2.15) and (2.19) together, it is follow that ∞  g k 4 k=0

 d k 2

< ∞.

This completes the proof.



The proof of global convergence will be given in the following theorem Theorem 2.4. Suppose that Assumption 2.1 holds. Then RMIL+ conjugate gradient method is globally convergent under the strong Wolfe line search with 0 < σ < 14 , that is,

lim

k→∞

g k  = 0 .

(2.20)

Proof. The proof is by contradiction. It assumes that (2.20) does not hold, then there exists a constant c > 0 and an integer k1 such that

gk  ≥ c, for all k ≥ k1 .

(2.21)

O.O.O. Yousif / Applied Mathematics and Computation 367 (2020) 124777

5

This means

1

g k 

2



1 , for all k ≥ k1 . c2

(2.22)

From (1.3), by squaring the both sides of dk + gk = βkRMIL+ dk−1 , we get

 2 dk 2 = −gk 2 − 2gTk dk + βkRMIL+ dk−1 2 , ∀ k ≥ 1.

(2.23)

From (2.9), we come to

(2 − 4σ )gk 2 < −2gTk dk < (2 + 4σ )gk 2 , ∀ k ≥ 0. Since −2gTk dk < (2 + 4σ )gk 2 is always true for all k ≥ 0, we can use it together with (2.23) to get

 2 dk 2 < −gk 2 + (2 + 4σ )gk 2 + βkRMIL+ dk−1 2 , ∀ k ≥ 1,

which leads to

 2 dk 2 < (1 + 4σ )gk 2 + βkRMIL+ dk−1 2 .

Substituting (2.1), we get

d k  2 < ( 1 + 4 σ ) g k  2 +

g k 4 dk−1 2 . dk−1 4

Dividing the both sides by gk 4 , we obtain

1 1 d k  2 < ( 1 + 4σ ) + . g k  4 gk 2 dk−1 2 Substituting (2.8), we get

4 d k  2 ( 1 + 4 σ ) < + . g k  4 g k 2 gk−1 2

(2.24)

Combining (2.22) and (2.24) together, we come to dk 2 σ) < (1+4 + c42 , for all k ≥ k1 + 1. This means c2 gk 4

c2 g k 4 > , for all k ≥ k1 + 1. 2 5 + 4σ d k 

(2.25)

Since (2.25) is true for all k ≥ k1 + 1, then n  k=k1 +1

n  c2 c2 g k 4 > = ( n − k1 ) . 2 5 + 4σ 5 + 4σ d k  k=k1 +1

Hence, ∞  k=0

∞ n   c2 g k 4 g k 4 g k  4 > = lim > lim n − k = ∞. ( ) 1 n→∞ 4 + 4σ dk 2 k=k1 +1 dk 2 n→∞ k=k1 +1 dk 2

This contradicts Theorem 2.3. Therefore, the proof is completed.



3. Numerical experiment In this section, RMIL+, RMIL, FR, and NPRP conjugate gradient methods have implemented under the strong Wolfe line search with δ = 10−4 and σ = 10−1 . We have used these methods in comparison because • RMIL+ is a modified version of RMIL that its sufficient descent property and global convergence have established under the strong Wolfe line search in Section 2, whereas, the sufficient descent property and the global convergence of RMIL have not established yet under the strong Wolfe line search. • FR is one of the most famous conjugate gradient methods. This method satisfies the sufficient descent property and globally convergent under the strong Wolfe line search [10]. • NPRP is a modified version of PRP. It also satisfies the sufficient descent property and the global convergence under the strong Wolfe line search [9].

6

O.O.O. Yousif / Applied Mathematics and Computation 367 (2020) 124777

Fig. 1. Performance profile based on number of iterations.

Fig. 2. Performance profile based on CPU time.

The implementation is based on Nocedal–Wright line search algorithm for Wolfe conditions [8] and coded in MATLAB R under a stopping criterion set togk  < 10−6 . The run has been conducted on PC computer with Intel

CoreTM i5-2520 M CPU @ 2.50 GHz processor, 4GB for RAM memory and Windows 7 Professional operating system. Most of the test problems are from [14]. To show the robustness, test problems have been implemented under low, medium, and high dimensions, namely, 2, 3, 4, 10, 50, 100, 500, 1000, and 10,000. Furthermore, for each dimension, two different initial points are used, one of which is the initial point that suggested by Andrei [14]. The comparison is based on the number of iterations (NOI) and the time (in seconds) required for running each of the test problems (CPU). In Table 1, Dim. is for dimension and x0 for initial point. Also in Table 1, a method is considered to be failed and we report “Fail” if the CPU time exceeded 10 min (600 s) or a not descent search direction has occurred while running. In Table 1, FR failed to solve only one problem (time exceeded 600 s) and RMIL failed to solve ten problems (a not descent search direction occurred in each case), whereas, RMIL+ and NPRP have solved all problems. Now based on Table 1, we will show the performance of FR, NPRP, RMIL, and RMIL+ methods in Figs. 1 and 2 using the performance profile introduced by Dolan and More [3]. That is, we plot the fraction P of the test problems for which any given method is within a factor t of the best. Obviously, in a plot of performance profiles, the top curved shape of the method is a winner. Additionally, the right of the plot is a measure of a method’s robustness.

O.O.O. Yousif / Applied Mathematics and Computation 367 (2020) 124777

7

Table 1 Comparison between FR, NPRP, RMIL, and RMIL+ methods. No.

Test problem

1

Booth

2

4

Three-hump camel Six-hump camel Trecanni

5

Zettl

6

NONSCOMP

7

Matyas

8

Dixon and Price Freudenstein & Roth Generlized Tridiagonal 2 Ex. Quadratic Penalty QP1 Fletcher

3

9 10 11 12 13 14

Generlized Tridiagonal 1 Hager

15

Arwhead

16

Liarwhd

17

Power

18

Raydan 1

19

Extended DENSCHNB

20

Extended Penalty

21

Ex. Quadratic Penalty QP2 Quadratic QF2 Quadratic QF1

22 23

24

Extended Tridiagonal 1

25

Diagonal 4

26

Extended Rosenbrock

Dim.

2

x0

(5, 5) (10, 10) 2 (−1, 2) (−5, 10) 2 (−1, 2) (−5, 10) 2 (−1, 0.5) (−5, 10) 2 (−1, 2) (10, 10) 2 (3, 3) (10, 10) 2 (1, 1) (20, 20) 3 (1, 1, 1) (10, 10, 10) 4 (0.5, −2, 0.5, −2) (5, 5, 5, 5) 4 (1, 1, 1, 1) (10, 10, 10, 10) 4 (1, 1, 1, 1) (10, 10, 10, 10) 10 (0, 0,…, 0) (10, 10,…, 10) 10 (2, 2,…, 2) (10, 10,…, 10) 10 (1, 1,…, 1) (−10, −10,…, −10) 10 (1, 1,…, 1) (10, 10,…, 10) 10 (4, 4,…, 4) (0, 0,…, 0) 10 (1,1,…, 1) (10, 10,…, 10) 10 (1,1,…, 1) (10, 10,…, 10) 100 (−1,−1,…, −1) (−10, −10,…, −10) 10 (1, 1,…, 1) (10, 10,…, 10) 100 (10, 10,…, 10) (−50, −50,…, −50) 10 (1, 2,…, 10) (−10, −10,…, −10) 100 (5, 5,…, 5) (−10, −10,…, −10) 100 (1, 1,…, 1) (10, 10,…, 10) 50 (0.5, 0.5,…,0.5) (30, 30,…, 30) 50 (1, 1, .., 1) (10, 10,…, 10) 500 (1, 1,…, 1) (−5, −5,…, −5) 500 (2, 2,…, 2) (10, 10,…, 10) 1000 (1, 1,…, 1) (−10, 10,…, −10, 10) 500 (1, 1,…, 1) (−20, −20, .., −20) 1000 (1, 1,…, 1) (−30, −30,…, −30) 1000 (−1.2, 1,…, −1.2, 1) (10, 10,…, 10) 10,000 (−1.2, 1,…, −1.2, 1) (−5, −5,…, −5)

FR

NPRP

RMIL+

RMIL

NOI

CPU

NOI

CPU

NOI

CPU

NOI

CPU

2 2 12 12 10 125 1 189 12 27 18 240 1 1 75 18 24 23 8 11 21 77 1128 1902 27 37 11 102 11 16 Fail 1 10 10 19 2440 94 868 7 72 13 77 17 16 161 12 216 266 116 1415 38 40 131 137 340 717 399 751 2 2 2 2 88 273 89 48

0.01 0.01 0.03 0.03 0.03 0.06 0.01 0.07 0.04 0.05 0.03 0.08 0.01 0.01 0.05 0.02 0.03 0.03 0.02 0.02 0.05 0.06 0.32 0.51 0.03 0.04 0.02 0.05 0.02 0.04 Fail 0.01 0.03 0.03 0.03 0.54 0.09 0.33 0.02 0.06 0.04 0.10 0.03 0.03 0.18 0.02 0.41 0.46 0.09 0.42 0.03 0.04 0.30 0.19 0.66 1.23 1.57 2.91 0.02 0.02 0.05 0.05 0.43 1.95 25.23 8.64

2 2 9 18 7 11 1 9 12 12 12 14 1 1 26 54 10 11 8 9 17 13 86 194 22 29 12 19 10 11 29 1 10 10 27 30 97 128 6 9 10 11 16 8 24 17 47 46 92 95 38 40 627 729 25 31 25 29 2 2 5 5 28 75 28 34

0.01 0.01 0.03 0.04 0.02 0.03 0.01 0.03 0.04 0.04 0.03 0.03 0.01 0.01 0.04 0.04 0.02 0.02 0.02 0.02 0.04 0.03 0.04 0.10 0.02 0.03 0.02 0.03 0.02 0.02 0.10 0.01 0.03 0.03 0.04 0.04 0.10 0.15 0.02 0.03 0.04 0.05 0.03 0.02 0.04 0.03 0.11 0.10 0.07 0.08 0.03 0.04 0.72 0.75 0.11 0.10 0.17 0.18 0.02 0.02 0.05 0.05 0.17 0.31 6.43 6.53

2 2 15 17 9 10 1 10 20 23 18 Fail 1 1 40 35 12 Fail 4 11 14 20 72 138 21 27 12 17 8 9 46 1 123 139 19 Fail 107 178 7 8 8 11 Fail Fail Fail Fail 37 Fail 78 78 69 78 447 500 28 10 28 14 2 2 2 2 30 51 30 31

0.01 0.01 0.04 0.04 0.03 0.03 0.01 0.03 0.05 0.05 0.03 Fail 0.01 0.01 0.05 0.04 0.02 Fail 0.01 0.02 0.03 0.04 0.04 0.06 0.02 0.03 0.02 0.03 0.01 0.02 0.12 0.01 0.06 0.09 0.03 Fail 0.09 0.18 0.02 0.02 0.03 0.04 Fail Fail Fail Fail 0.09 Fail 0.06 0.06 0.04 0.06 0.65 0.63 0.12 0.05 0.19 0.11 0.02 0.02 0.05 0.05 0.18 0.20 6.95 5.96

2 2 11 13 6 9 1 15 20 14 427 194 1 1 157 151 12 14 8 10 16 10 146 317 32 40 12 19 14 14 43 1 87 95 22 36 267 707 7 15 15 9 79 51 377 394 77 56 253 259 83 73 269 251 114 39 114 6 5 4 4 6 320 566 314 429

0.01 0.01 0.03 0.03 0.02 0.03 0.01 0.04 0.05 0.04 0.12 0.07 0.01 0.01 0.09 0.08 0.02 0.02 0.02 0.02 0.04 0.02 0.06 0.12 0.03 0.04 0.02 0.03 0.03 0.03 0.12 0.01 0.04 0.05 0.03 0.05 0.17 0.25 0.02 0.03 0.05 0.04 0.05 0.04 0.21 0.19 0.13 0.11 0.10 0.10 0.05 0.05 0.39 0.37 0.31 0.17 0.52 0.06 0.03 0.03 0.06 0.07 0.83 1.60 46.49 64.10

(continued on next page)

8

O.O.O. Yousif / Applied Mathematics and Computation 367 (2020) 124777 Table 1 (continued) No.

27

Test problem

Extended Himmelblau

Dim.

1000 10,000

28

Strait

1000 10,000

29

Shallow

1000 10,000

30

Extended Beale

1000 10,000

x0

(1, 1,…, 1) (20, 20,…, 20) (−1, −1,…, −1) (50, 50,…, 50) (0, 0,…, 0) (5, 5,…, 5) (0, 0,…, 0) (5, 5,…, 5) (0, 0,…, 0) (10, 10,…, 10) (−1, −1,…, −1) (−10, −10,…, −10) (1, 0.8,…, 1, 0.8) (0.5, 0.5,…, 0.5) (−1, −1,…, −1) (0.5, 0.5,…, 0.5)

FR

NPRP

RMIL+

RMIL

NOI

CPU

NOI

CPU

NOI

CPU

NOI

CPU

14 10 24 19 41 104 39 88 11 347 31 16 89 47 67 48

0.10 0.06 5.60 3.58 0.14 0.38 6.55 15.66 0.09 1.02 4.72 2.98 0.47 0.33 11.00 7.97

12 7 15 18 27 30 23 32 6 16 13 21 34 27 12 27

0.08 0.05 2.64 3.37 0.11 0.16 4.19 6.63 0.05 0.12 2.22 3.80 0.26 0.28 2.49 4.94

12 10 9 Fail 52 Fail 42 Fail 29 30 51 Fail 41 53 19 55

0.07 0.06 1.79 Fail 0.16 Fail 6.88 Fail 0.11 0.19 7.39 Fail 0.20 0.40 3.68 9.32

22 12 12 19 38 42 22 79 107 83 9 19 359 11 21 12

0.11 0.07 2.21 3.43 0.13 0.16 3.87 14.89 0.33 0.31 1.60 3.60 1.35 0.12 4.21 2.41

Since the number of iterations and the CPU time depends on each other, Figs. 1 and 2 are almost same. An observation on the left sides of both figs. shows that NPRP and RMIL have lower number of iterations (CPU time). Further observation on the right sides shows that NPRP and RMIL+ solve all problems, hence, reach 100%, whereas, FR solves about 99% and RMIL solves about 88%. Moreover, in most cases, Figs. 1 and 2 show that the curve of RMIL+ lies above the both curves of FR and RMIL. Therefore, RMIL+ under the strong Wolfe line search can be used in the practical computations. 4. Conclusions In this paper, under some assumptions, the global convergence and the sufficient descent property of RMIL+ conjugate gradient method when it is applied under strong Wolfe line search with 0 < σ < 14 , have been established. A numerical experiment has shown that RMIL+ under the strong Wolfe line search can be used successfully in the practical computations. Acknowledgment The author would like to thank the editors and the referees for their valuable suggestions and comments which lead us to improve this paper. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]

B.T. Polyak, The conjugate gradient method in extremem problems, USSR Comp. Math. Math. Phys. 9 (1969) 94–112. D. Touati-Ahmed, C. Storey, Efficient hybrid conjugate gradient techniques, J. Optim. Theory Appl. 64 (1990) 379–397. E. Dolan, J.J. More, Benchmarking optimization software with performance profile, Math. Prog. 91 (2002) 201–213. E. Polak, G. Ribière, Note sur la convergence de directions conjuguée, Rev. Francaise Informat Recherche Operationelle, 3e Année 16 (1969) 35–43. G.H. Liu, J.Y. Han, and H.X. Yin, Global convergence of the Fletcher-Reeves algorithm with an inexact line search, Report, Institute of Applied Mathematics, Chinese Academy of Sciences, Beijing, 1993. G. Zoutendijk, Nonlinear programming, computational methods, in: J. Abadie (Ed.), Integer and Nonlinear Programming, Amsterdam, North- Holland, 1970, pp. 37–86. J.C. Gilbert, J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM. J. Optim. 2 (1992) 21–42. J. Nocedal, S.J. Wright, Numerical Optimization, second edition, Springer Science+ Business Media, LLC, 2006. L. Zhang, An improved Wei–Yao–Liu nonlinear conjugate gradient method for optimization computation, Appl. Math. Comput. 215 (2009) 2269–2274. M. Al-Baali, Descent property and global convergence of the Fletcher–Reeves method with inexact line search, IMA J. Numer. Anal. 5 (1985) 121–124. M.J.D. Powell, Nonconvex Minimization Calculations and the Conjugate Gradient Method, Lecture Notes in Mathematics, 1066, Springer-Verlag, Berlin, 1984 122-14. M.R. Hestenes, E.L. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards. 49 (1952) 409–436. M. Rivaie, M. Mamat, W.J. Leong, I. Mod, A new class of nonlinear conjugate gradient coefficient with global convergence properties, Appl. Math. Comput. 218 (2012) 11323–11332. N. Andrei, An unconstrained optimization test functions collection, Adv. Model. optim. 10 (1) (2008) 147–161. R. Fletcher, Practical Method of Optimization, Vol. 1, Unconstrained Optimization, John Wiley & Sons, New York, 1987. R. Fletcher, C. Reeves, Function minimization by conjugate gradients, Comput. J. 7 (1964) 149–154. Y.F. Hu, C. Storey, Global convergence result for conjugate gradient methods, J. Optim. Theory Appl. 71 (1991) 399–405. Y. Liu, C. Storey, Efficient generalized conjugate gradient algorithms. Part 1: Theory, J. Optimiz. Theory Appl. 69 (1992) 129–137. Y.H. Dai, Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property, SIAM J. Optim. 10 (20 0 0) 177–182. Z. Dai, Comments on a new class of nonlinear conjugate gradient coefficients with global convergence properties, Appl. Math. Comput. 276 (2016) 297–300.