Nonmonotone gradient methods for vector optimization with a portfolio optimization application

Nonmonotone gradient methods for vector optimization with a portfolio optimization application

Accepted Manuscript Nonmonotone Gradient Methods for Vector Optimization with a Portfolio Optimization Application Shaojian Qu, Ying Ji, Jianlin Jian...

910KB Sizes 0 Downloads 36 Views

Accepted Manuscript

Nonmonotone Gradient Methods for Vector Optimization with a Portfolio Optimization Application Shaojian Qu, Ying Ji, Jianlin Jiang, Qingpu Zhang PII: DOI: Reference:

S0377-2217(17)30455-1 10.1016/j.ejor.2017.05.027 EOR 14455

To appear in:

European Journal of Operational Research

Received date: Revised date: Accepted date:

14 October 2015 10 February 2017 12 May 2017

Please cite this article as: Shaojian Qu, Ying Ji, Jianlin Jiang, Qingpu Zhang, Nonmonotone Gradient Methods for Vector Optimization with a Portfolio Optimization Application, European Journal of Operational Research (2017), doi: 10.1016/j.ejor.2017.05.027

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • Vector optimization is studied. • Two nonmonotone gradient algorithms are proposed for vector optimization.

CR IP T

• The global and local convergence results for the new algorithms are presented. • The efficiency of the new algorithm is shown by an application to a portfolio opti-

AC

CE

PT

ED

M

AN US

mization problem.

1

ACCEPTED MANUSCRIPT

Nonmonotone Gradient Methods for Vector

Application

CR IP T

Optimization with a Portfolio Optimization

Shaojian Qu1,2∗ Ying Ji1,2 Jianlin Jiang3 Qingpu Zhang4

1. Business School, University of Shanghai for Science and Technology,

AN US

516 Jungong Road, Shanghai 200093, P.R. China Tel: +86-15601980161

2. The Institute of Logistics-Asia Pacific, National University of Singapore 3. College of Science, Nanjing University of Aeronautics and Astronautics

M

4. School of Management, Harbin Institute of Technology

PT

ED

May 26, 2017

Abstract

CE

This paper proposes two nonmonotone gradient algorithms for a class of vector optimization problems with a C−convex objective function. We establish both the global and local convergence results for the new algorithms. We then apply the new

AC

algorithms to a portfolio optimization problem under multi-criteria considerations. Key words: (S) Multiple objective programming; Nonmonotone gradient algorithms; Pareto optimum; Convergence; Portfolio optimization



Email address: [email protected]

ACCEPTED MANUSCRIPT

1

Introduction

Vector-valued optimization stems from multi-objective programming, multi-criteria decision making, statistics, and cooperative game theory (see Jahn, 2003; Handi, Kell and Knowles, 2007). Such optimization problems have been extensively studied and applied in various decision-making contexts. Let C ⊆ Rm be a convex, closed and pointed

CR IP T

cone with int(C) 6= ∅. We define the partial order  as follows, for any u, v ∈ Rm , u  v ⇐⇒ v − u ∈ C. We also define the relation ≺ as u ≺ v ⇐⇒ v − u ∈ int(C). In this paper, we consider a class of C−convex vector optimization problems, min f (x) s.t. x ∈ S, C

(1.1)

AN US

where f : Rn → Rm is C−convex (see the definition in Section 2), and S ⊆ Rn is the constraint set, which is assumed to be closed and convex. Our goal is to propose two implementable nonmonotone gradient algorithms to find a critical point (see Definition 2.2) for (1.1). Note that “min” denotes the optimum with respect to the cone C. It is C

M

known that many practical problems can be cast in the format of (1.1). The interested reader is referred to Pardalos and Hearn (1999, Chap. 10) and Ehrgott (2000) for the

ED

detailed modelling in this regard.

Recently, several iterative methods for solving scalar optimization problems have been extended to solve multi-criteria optimization problems (see Villacorta and Oliveira, 2011;

PT

Bello Cruz, P´ erez, and Melo, 2011; Bonnel, Iusem, Svaiter, 2005; Qu, Goh, Souza and Wang, 2014; Bento, Cruz Neto, and Soubeyran, 2014; Iusem, Saviter, and Teboulle, 1994;

CE

Qu, Goh, Ji, and Souza, 2014; Britoa, Cruz Neto, Santos, and Souza, 2017; Carrizo, Lotito, and Maciel, 2016). Our study is in line with of these ideas. We propose a similar

AC

extension for the case of nonmonotone gradient methods for scalar convex optimization. Gradient-based methods are already widely used in the literature, which seek to obtain

a point satisfying the first-order conditions for Pareto optimality. Fliege and Svaiter (2000) have presented a steepest descent method for solving multiobjective optimisation problems that does not rely on the scalarisation approaches. At every iteration, this method has two features: (i) by an extended steepest descent method a direction is obtained in which feasible solutions that dominate the current point are chosen; (ii) to find a point that 3

ACCEPTED MANUSCRIPT

dominates the current one along this direction, a line search based on Armijo rule is conducted. Vieira et al. (2012) generalise this idea in which Armijo rule is replaced with a multiobjective golden section line search. Fliege et al. (2009) propose Newton methods for solving multiobjective convex optimization. Under the assumptions of twice continuously differentiable and strong local convexity, superlinear convergence is established. Qu et

CR IP T

al. (2011, 2014) generalize it to present Quasi-Newton methods for solving multiobjective optimization without convex assumption. Under normal assumption, the global and local convergence results are given.

To our knowledge, the gradient-based methods listed above are mostly taking monotone linear search and the monotonicity of the function values acts an crucial role in the

AN US

convergence proof. Our gradient-based algorithms are different from these methods and are taking nonmonotone linear search. Bello Cruz (2013) presents a subgradient method for vector optimization problems without using scalar-valued objectives, which does not need the monotonicity of the function values in the convergence analysis. The main difference between our method and that in Bello Cruz (2013) is that we take nonmonotone

M

line search specifically and Bello Cruz (2013) does not take line search. The primary contributions of this paper are as follows. We extend the nonmonotone

ED

gradient algorithm to a vector optimization setting and demonstrate the power of the new algorithm in solving vector optimization problems. We prove global convergence

PT

for the new algorithm and analyze its local linear convergence rate. We apply the new method to portfolio management, where conflicting multi-criteria considerations can affect

CE

performance.

This paper is thus organized as follows. Section 2 presents some preliminaries on the

AC

notations. Section 3 extends the nonmonotone gradient algorithm to a vector optimization setting and presents the convergence of the new algorithm. The convergence rate for the new algorithm is also presented in this section. An application to a portfolio management problem is presented in Section 4. Section 5 concludes.

4

ACCEPTED MANUSCRIPT

2

Preliminaries

In what follows, R is the set of real numbers, R+ denotes the set of non-negative real numbers and R++ is the set of strictly positive real numbers. Function f : Rn → Rm is C − convex iff for any x, y ∈ Rn and any λ ∈ [0, 1],

CR IP T

f (λx + (1 − λ)y)  λf (x) + (1 − λ)f (y).

Function f is strictly C − convex iff for any x, y ∈ Rn and any λ ∈ (0, 1), f (λx + (1 − λ)y) ≺ λf (x) + (1 − λ)f (y).

AN US

When f is differentiable, we denote by Jf (x) ∈ Rm×n the Jacobian of the function f

at x. The indicator function of a set S ⊂ Rn is defined as δ(x; S) := 0 iff x ∈ S and δ(x; S) := +∞, if x ∈ / S. The normal cone to a convex set S at x¯ ∈ S is defined by NS (¯ x) := ∂δ(¯ x; S) = {x∗ ∈ Rn |< x∗ , x − x¯ >≤ 0, ∀x ∈ S}.

M

In a multi-criteria setting, there are many optimality definitions. Strong Pareto optimality (SPO) and weak Pareto optimality (WPO) are two of the important concepts,

ED

which can be mathematically defined as follows. Definition 2.1 For a point x∗ ∈ S, x∗ is a weak Pareto optimum (WPO) (resp. strong

PT

Pareto optimum (SPO)) of (1.1) iff @ x ∈ S such that f (x) ≺ f (x∗ ) (resp.f (x)  f (x∗ )).

CE

A necessary (but in general not sufficient) condition for Pareto optimality is defined as follows (see Bello Cruz, P´ erez, and Melo, 2011; Qu, Goh, Souza and Wang, 2014; Fliege

AC

and Svaiter, 2000).

Definition 2.2 x∗ ∈ S is a critical point (or stationary point) for f , if R(Jf (x∗ )) ∩ (−int(C)) = ∅,

(2.1)

where R(Jf (x∗ )) denotes the range or image space of the gradient of the continuously differentiable function f at x∗ . 5

ACCEPTED MANUSCRIPT

The above concept is used to develop gradient methods for solving Problem (2.1) and has also been extensively used to develop descent-type algorithms for vector optimization in recent years (Fliege and Svaiter, 2000; Qu, Goh, and Chan, 2011). In the rest of this section, we present an alternative characterization of criticality. Define the positive polar cone of C, C ∗ := {y ∈ Rm : y T x ≥ 0, ∀x ∈ C}. Given

U ⊂ {u ∈ C ∗ :

CR IP T

a point x ∈ S, a positive definite matrix B ∈ Rn×n and a closed and nonempty set k u k= 1}, we denote by dB (x) an optimal solution of the following

problem for x ∈ S:

1 min max uT Jf (x)d + dT Bd, s.t., k d k≤ 1, x + d ∈ S. u∈U d 2

(2.2)

AN US

The feasible set of Problem (2.2) is always nonempty since d = 0 is always a feasible point for any x ∈ S. The constraint direction norm k d k≤ 1 is used to improve performance as it eliminates the possibly unbounded case, i.e., k d k→ ∞. This is also used in Qu, Goh, and Chan (2011). In Problem (2.2), we only assume the closedness of U without convexity. Actually, we can use the convex hull of U in (2.2) without changing our problem, that is,

M

an optimal solution of (2.2) is equivalent to finding an optimal solution of the following problem, d

1 uT Jf (x)d + dT Bd, s.t., k d k≤ 1, x + d ∈ S u∈Conv(U ) 2 max

ED

min

(2.3)

PT

where Conv(·) denotes the convex hull. This can be shown in the following lemma. Lemma 2.3 For any given x ∈ S, the optimal solution set of (2.2) is also the optimal

Proof.

CE

solution set of (2.3).

To prove this, for any given x ∈ S, we first cast (2.2) in the following form,

AC

min{r| (r, d) ∈ Ω(x)}, with Ω(x) := {(r, d) ∈ Rn+1 | x + d ∈ S, uT Jf (x)d + 21 dT Bd ≤

e r, ∀u ∈ U, k d k≤ 1}. Define Ω(x) := {(r, d) ∈ Rn+1 | x + d ∈ S, uT Jf (x)d + 12 dT Bd ≤ e r, ∀u ∈ Conv(U ), k d k≤ 1}, x ∈ S. Now the proof reduces to showing that Ω(x) = Ω(x). e e Clearly, for any x ∈ S, Ω(x) ⊂ Ω(x). Next, we show that Ω(x) ⊂ Ω(x). Because for

e any given x ∈ S, Ω(x) 6= ∅, there is (r, d) ∈ Ω(x). We will show that (r, d) ∈ Ω(x). k P Given any u ∈ Conv(U ), ∃ui ∈ U , i = 1, · · · , k, such that u = λi ui , where λi ≥ 0 i=1

6

ACCEPTED MANUSCRIPT

(i = 1, · · · , k) and

k P

i=1

λi = 1 with some k ≤ m + 1. It follows from the definition of

Ω(x) that (ui )T Jf (x)d + 21 dT Bd ≤ r, i = 1, · · · , k. Therefore, uT Jf (x)d + 21 dT Bd ≤ k P e e λi (ui )T Jf (x)d + 12 dT Bd ≤ r, which implies that (r, d) ∈ Ω(x). Then, Ω(x) ⊂ Ω(x).

i=1

e Therefore, we have Ω(x) = Ω(x), ∀x ∈ S.

Lemma 2.3 implies that we can always assume that U is closed and convex. For the k y k= 1} is closed and convex, and

CR IP T

rest of this paper, we assume that U ⊂ {y ∈ C ∗ :

the cone generated by its convex hull is C ∗ . The following lemma provides an alternative characterization of criticality that will be used in our development of algorithms for (1.1). Lemma 2.4 For any positive definite matrix B ∈ Rn×n , x∗ ∈ S is a critical point of

AN US

Problem (1.1) if and only if dB (x∗ ) = 0, where dB (x∗ ) is an optimal solution of Problem (2.2) with x := x∗ .

Proof. Firstly, we show that if x∗ ∈ S is a critical point of Problem (1.1), dB (x∗ ) = 0,

where dB (x∗ ) is an optimal solution of Problem (2.2) with x = x∗ . We prove it by contradiction, i.e., if dB (x∗ ) 6= 0, we can prove that Jf (x∗ )dB (x∗ ) ∈ −intC. It is easy to

M

see that the optimal value SubB (x∗ ) of (2.2) at x∗ is non positive since d = 0 is always

we have that ∀u ∈ U

ED

feasible to (2.2). As B is positive definite and dB (x∗ ) 6= 0, from the definition of dB (x∗ ),

CE

PT

1 uT Jf (x∗ )dB (x∗ ) < uT Jf (x∗ )dB (x∗ ) + dB (x∗ )T BdB (x∗ ) 2 1 T ∗ ∗ ≤ max u Jf (x )dB (x ) + dB (x∗ )T BdB (x∗ ) u∈U 2 ∗ = SubB (x ) ≤ 0.

This means that Jf (x∗ )dB (x∗ ) ∈ −intC and x∗ is noncritical. It contradicts that x∗ ∈ S

AC

is a critical point.

Next, we show by contradiction that if dB (x∗ ) = 0 is an optimal solution of Problem

(2.2) with x = x∗ , x∗ is a critical point of Problem (1.1). If not, then there exists a

direction d¯ ∈ Rn such that uT Jf (x∗ )d¯ < 0, ∀u ∈ U . From the positivity of B, there exists α ¯ ∈ (0, 1] such that α ¯ uT Jf (x∗ )d¯ + 21 α ¯ 2 d¯T B d¯ < 0, ∀u ∈ U , which means that the optimal value of (2.2) is negative. This contradicts dB (x∗ ) = 0. Hence, the assertion of this lemma is true. 7

ACCEPTED MANUSCRIPT

Note that the objective function max uT Jf (x)d + 12 dT Bd in Problem (2.2) is not necu∈U

essarily a differentiable function. Therefore, it is difficult to directly solve Problem (2.2). Actually, by introducing a variable β, this problem can be reduced to 1 min β + dT Bd, s.t., uT Jf (x)d ≤ β, u ∈ U, k d k≤ 1, x + d ∈ S. β, d 2

Nonmonotone gradient Algorithms

CR IP T

3

(2.4)

In this section, we present two nonmonotone gradient algorithms. We also establish global convergence and local linear rate of convergence for the proposed methods. In the literature (Fukuda and Gra˜ na Drummond (2011, 2013), Gra˜ na Drummond and Iusem (2004),

AN US

Fliege and Svaiter (2000), and Qu, Goh and Souza (2011)), several gradient methods were proposed for solving problem (1.1) or its special case. In particular, Gra˜ na Drummond and Iusem (2004) proposed an extension of the projected gradient method for vector optimization problems. Under some reasonable assumptions, Fukuda and Gra˜ na Drum-

M

mond (2011) established full convergence to optimal points of any sequence produced by the projected gradient method with an Armijo-like rule, no matter how poor the ini-

ED

tial guesses may be. In addition, Fliege and Svaiter (2000) proposed a steepest decent method and Qu, Goh and Souza (2011) developed a Quasi-Newton algorithm for solving multi-objective optimization. We note that the above gradient methods are all based on

PT

monotone line search, i.e., they only accept the trial point as the next iterate if its associated objective function value is strictly lower than that at the current iterate, which can

CE

slow the convergence rate in the minimization process, especially in the presence of narrow curved valleys (Grippo, Lampariello, and Lucidi 1986). Toint (1996) pointed out that the

AC

nonmonotone technique is helpful in overcoming the case where the sequence of iterates is to follow the bottom of curved narrow valleys (a common occurrence in difficult nonlinear problems). Since then, many authors have given numerous nonmonotone line seach-based algorithms for solving scalar optimization problems (see for instance Qu, Goh, and Zhang, 2011, Ji et al. 2010). These papers indicated that the nonmonotone algorithm is efficient, especially for ill-conditioned problems. However, in contrast to nonmonotone line searchbased algorithms for solving scalar optimization problems, the studies on nonmonotone 8

ACCEPTED MANUSCRIPT

line search-based algorithms for solving multi-objective optimization are relatively fewer.

3.1

Algorithm 1

Algorithm 1: ¯ and Step 0. Initialization: Choose parameters η > 1, σ ∈ (0, 1), 0 < b < ¯b, 0 < θ ≤ θ,

CR IP T

¯ and bI 5 Bk 5 ¯bI. Let x0 ∈ S be the initial vector. integer M ≥ 0. Choose θk0 ∈ [θ, θ]

Set k := 0; Step 1. For j = 0, 1, · · ·

1a: Let B := θk Bk with θk := θk0 η j . Solve Problem (2.2) with x := xk and obtain the

optimal solution dB (xk ). If dB (xk ) = 0, then stop and xk is a critical point; Otherwise,

AN US

obtain dk := dB (xk ). 1b: If dk satisfies f (xk + dk ) 

turn to 1a;

max

[k−M ]+ ≤i≤k

f (xi ) + σJf (xk )dk ,

(3.1)

f (xi ) means componentwise. Otherwise, set j := j + 1 and

M

go to Step 2, where

max

[k−M ]+ ≤i≤k

ED

Step 2: Updated: Let xk+1 := xk + dk . Set k := k + 1 and turn to Step 1. Remark The above method is closely related to the one proposed by Qu, Goh, and Chan (2011). They differ from each other in that the nonmonotone linear search is used

PT

in inequality (3.1), which can overcome the case where the sequence of iterates is to follow the bottom of curved narrow valleys (a common occurrence in difficult nonlin-

CE

ear problems) (see Grippo, Lampariello, and Lucidi (1986), Toint (1996)). In monotone line search methods, f (xk+1 )  f (xk ) is satisfied. In nonmonotone line search methods,

AC

some growth in the function value is permitted. Many researchers have shown that nonmonotone schemes in solving scalar optimization problems can improve the likelihood of finding a global optimum (see Dai (2002), Toint (1996)). Some encouraging numerical results in solving difficult nonlinear scalar optimization problems have also been reported (see Zhang and Hager (2004), Shi and Wang (2011), Lu and Zhang (2012), Ji, Zhang and Zhang (2010)). However, to the best of our knowledge, there is no research about extending nonmonotone schemes to solve vector optimization problems. When M = 0 in 9

ACCEPTED MANUSCRIPT

(3.1), it reduces to monotone line search which has been extensive used in gradient-based algorithms for solving multi-objective optimization. Our method can be viewed as an extension of one steepest gradient method studied by Fliege and Svaiter (2000). We note that local convergence is established for our method, but not studied for the methods given by Qu, Goh, and Chan (2011) and Fliege and Svaiter (2000). We explain the role

CR IP T

of all involved constants in the above algorithm, i.e., M , Bk , b, ¯b and θk . The constant M is a prefixed nonnegative integer and is used as nonmonotone line search in (3.1), which determines that the trial point xk + dk as the next iterate if its associated objective function value is strictly lower than

max

[k−M ]+ ≤i≤k

f (xi ), i.e., the maximal function value from

[k − M ]+ -th iteration to k-th iteration. At k-th iteration, Bk is used to replace B in

AN US

problem (2.2) with x := xk . b and ¯b guarantee Bk bounded below and above respectively. θk updates B in (2.2). With sufficiently large θk , it can be found in the following Lemma

3.1 that there is dB (xk ) obtained by solving (2.2) satisfying nonmontone line search (3.1). The well-posedness of Algorithm 1 is presented in the following lemma.

M

Lemma 3.1 If  = 0 and xk ∈ S is noncritical, there exists θe > 0 such that dk :=

e dBk (θk ) (xk ), where Bk (θk ) := θk Bk , satisfies (3.1) whenever θk ≥ θ.

ED

Proof. For simplicity, we let d(θ) := dBk (θ) (xk ) with Bk (θ) := θBk . From the feasibility

PT

of d(θ) to (2.4), we obtain that for all θ > 0, d(θ) 6= 0 and further, θ k d(θ) k≤ −

2uT Jf (xk )d(θ)/ k d(θ) k , ∀u ∈ U (xk ), λmin (Bk )

(3.2)

CE

where U (x) be the solution set of the inner maximization problem of (2.2) for the given x ∈ S. (3.2) implies θ k d(θ) k, ∀θ > 0 is bounded. Therefore, we obtain k d(θ) k→ 0 as

AC

θ → ∞. To prove this Lemma, we need to prove the following conclusion, lim inf θ k d(θ) k> 0. θ→∞

(3.3)

We prove (3.3) by contradiction. If (3.3) is not true, there exists a sequence {θ¯l } with

θ¯l → ∞ such that

θ¯l k d(θ¯l ) k→ 0 as l → ∞,

10

(3.4)

ACCEPTED MANUSCRIPT

where d(θ¯l ) is the optimal solution of (2.4) with x := xk , B := θ¯l Bk and θ := θ¯l . From the definition for U (xk ), the following conclusion holds, 1 1 uT Jf (xk )d + dT Bd ≤ β, ∀u ∈ U (xk ) ⇐⇒ uT Jf (xk )d + dT Bd ≤ β, ∀u ∈ C ∗ . (3.5) 2 2 Therefore, d(θ¯l ) is also the optimal solution of the following problem,

CR IP T

1 min β, s.t., uT Jf (xk )d + dT Bd ≤ β, u ∈ C ∗ , k d k2 ≤ 1, β ≤ −, x + d ∈ S, (3.6) β, d 2 where B = θ¯l Bk . This together with the definition for C ∗ implies that there exists a q P positive integer q, uj ∈ U and λj > 0, j = 1, · · · , q with λj = 1 such that j=1

q X

λj Jf (xk )T uj + θ¯l Bk d(θ¯l ) + 2γd(θ¯l ) + NS (xk + d(θ¯l ))

AN US

0∈

j=1

γ(1− k d(θ¯l ) k2 ) = 0, γ ≥ 0, 1− k d(θ¯l ) k2 ≥ 0

(3.7)

(3.8)

By taking limits on both sides of (3.7) and (3.8) as l → ∞, and using semicontinuity of

of NS (·) (see Theorem 24.4 of Rockafellar, 1970), and the relations θ¯l k d(θ¯l ) k→ 0 and

M

k d(θ¯l ) k→ 0, we see that

q X

λj Jf (xk )T uj + NS (xk ),

ED

0∈

(3.9)

j=1

PT

which means that xk is a stationary point of (1.1). This contradicts the noncriticality of xk . Hence, (3.4) holds. This conclusion together with θd(θ)T Bk d(θ) ≥ λmin (Bk )θ k d(θ) k

k d(θ) k= O(θd(θ)T Bk d(θ)) as θ → ∞.

AC

CE

(λmin (Bk ) > 0 from the generation of Bk ) leads to that (3.10)

From the first inequality of (3.6), 1 1 −uT Jf (xk )d(θ) ≥ θd(θ)T Bk d(θ) ≥ λmin (Bk ) k d(θ) k2 , ∀u ∈ C ∗ . 2 2

(3.11)

This relation together with (3.10) means that as θ → ∞ k d(θ) k= O(θd(θ)T Bk d(θ)) = O(uT Jf (xk )d(θ)), ∀u ∈ C ∗ . 11

(3.12)

ACCEPTED MANUSCRIPT

From this result and the relation k d(θ) k→ 0 as θ → ∞, we obtain uT (f (xk +d(θ))−f (xk )) = uT Jf (xk )d(θ)+o(k d(θ) k) ≤ σuT Jf (xk )d(θ), ∀u ∈ C ∗ (3.13) provided θ is sufficiently large. Using this result, we prove (3.1) by contradiction, i.e., there is u¯ ∈ C ∗ , such that as θ → ∞, f (xi )) > σ¯ uJf (xk )d(θ).

(3.14)

f (xi )) ≤ u¯T (f (xk + d(θ)) − f (xk )),

(3.15)

max

[k−M ]+ ≤i≤k

CR IP T

u¯T (f (xk + d(θ)) − This together with u¯T (f (xk + d(θ)) −

max +

[k−M ] ≤i≤k

AN US

implies that

u¯T (f (xk + d(θ)) − f (xk )) > σ¯ uT Jf (xk )d(θ),

(3.16)

holds for sufficiently large θ. This contradicts with (3.13) and therefore, this assertion is true. Convergence

M

3.1.1

ED

In the following lemma, we prove that the search directions {dk } converges to zero and the sequence of objective values {f (xk )} also converges, i.e., lim dk = 0 and lim f (xk ) = f ∗ k→∞

k→∞

for some f ∗ ∈ Rm . From Lemma 2.4, the convergence of {dk } to zero implies that the

PT

corresponding limit point {xk } if it exists is a critical point of (1.1). As Algorithm 1 is

a descent-type method, i.e., at each iteration a feasible solution is accepted only when it

CE

dominates the current point, the convergence of {f (xk )} implies that at the limit point there is no feasible solutions can be found to dominate the current point. For the proof

AC

of Lemma 3.2, we suppose that the sequence {f (xk )} is C−bounded from below, i.e., ∃f¯ ∈ Rm , s.t., f¯  f (xk ), ∀k.

The concept of C−boundedness is a generalization of the boundedness for scalar value functions and has been extensively used in the proof of the convergence for gradient-based methods for solving vector optimization (see Fukuda and Gra˜ na Drummond (2011, 2013), Qu, Goh, and Chan (2011)). 12

ACCEPTED MANUSCRIPT

Lemma 3.2 Suppose that the sequences {dk } and {xk } are generated by Algorithm 1,

and the sequence {f (xk )} is C−bounded from below. Then, as k → ∞, dk → 0, and

f (xk ) → f ∗ with some given f ∗ ∈ Rm .

Proof. For any k, define xl(k) := argmax{f (xi ) : [k − M ]+ ≤ i ≤ k} where l(k) is an

integer in [[k − M ]+ , k]. From the generation of {xk }, it is obvious f (xk+1 )  f (xl(k) ), ∀k.

CR IP T

This together with the definition of l(k) means that f (xl(k+1) )  f (xl(k) ), ∀k. Further,

from the assumption of this lemma that {f (xk )} is C−bounded from below, hence, for some given f ∗ ∈ Rm , the following limit holds,

lim f (xl(k) ) = f ∗ .

(3.17)

AN US

k→∞

We next show that the following limits hold for all j ≥ 1, lim dl(k−j) = 0,

k→∞

lim f (xl(k−j) ) = f ∗ .

k→∞

(3.18)

We prove it by induction. We replace k with l(k) − 1 in (3.1),

M

f (xl(k) )  f (xl(l(k)−1) ) + σJf (xl(k)−1 )dl(k)−1 ,

(3.19)

ED

Using (3.12) with k and θ replaced by l(k) − 1 and θl(k)−1 , respectively, we have

PT

uT Jf (xl(k)−1 )dl(k)−1 ≤ −bθl(k)−1 k dl(k)−1 k2 , ∀u ∈ C ∗

(3.20)

here the inequality follows from Bl(k)−1 = bI. Combining (3.19) and (3.20), we have

CE

 uT f (xl(k) ) − f (xl(l(k)−1) ) ≤ −σbθl(k)−1 k dl(k)−1 k2 , ∀u ∈ C ∗ .

(3.21)

Letting k → ∞ in the above relation and noticing (3.17), we have lim θl(k)−1 k dl(k)−1 k2 =

AC

k→∞

0. This together with θk ≥ θ, for all k, implies that lim dl(k)−1 = 0. Then using the k→∞

assumption about {f (xk )} and (3.17) again, we have

lim f (xl(k)−1 ) = lim f (xl(k) − dl(k)−1 ) = lim f (xl(k) ) = f ∗ .

k→∞

k→∞

k→∞

(3.22)

Therefore, (3.18) holds for j = 1. Similar with the proof of (3.18) for j = 1, it is easy to prove that if (3.18) is true for j, then it also holds for j + 1. 13

ACCEPTED MANUSCRIPT

Finally, we show that this lemma is true. According to the definition of l(k), it is obvious that k ≥ M + 1, k − M − 1 = l(k) − j for some 1 ≤ j ≤ M + 1, which together with the first limit in (3.18) leads to lim dk = lim dk−M −1 = 0. Further, we note that k→∞

k→∞

¯

x

l(k)

=x

k−M −1

+

lk X j=1

dl(k)−j , ∀k ≥ M + 1,

CR IP T

where ¯lk := l(k) − (k − M − 1) ≤ M + 1. This result together with (3.18) and the assumption about {f (xk )}, we have that lim f (xk ) = lim f (xk−M −1 ) = f ∗ . Therefore, k→∞

k→∞

the assertion of this lemma is true. We next show that Algorithm 1 is globally convergent.

AN US

Theorem 3.3 Suppose that the sequence {xk } are generated by Algorithm 1, and the

sequence {f (xk )} is C−bounded from below,. Then, any accumulation point of {xk } is a critical point of (1.1).

Proof. We prove this theorem by contradiction, i.e., there exists an accumulation point

M

x∗ of {xk } that is a noncritical point of (1.1). Suppose that there is a subsequence

{xk }k∈K → x∗ . We first prove that {θk }k∈K is bounded. If it is not true, then there is

ED

a subsequence of {θk }k∈K that goes to ∞. Without loss of generality, we assume that {θk }k∈K → ∞. This together with the choice of θk0 implies that there exists some index ¯ For simplicity, we define θ¯k := k¯ ≥ 0 such that θk > θk0 for all k ∈ K with k ≥ k.

θk η

and

PT

dk (θ) := dBk (θ) (xk ) for k ∈ K and θ > 0, where Bk (θ) = θBk . From the choice of θk in

CE

steps (1a) and (1b), there exists some u¯ ∈ U such that u¯T f (xk + dk (θ¯k )) >

max

[k−M ]+ ≤i≤k

u¯T f (xi ) + σ¯ uT Jf (xk )dk (θ¯k ), ∀k ∈ K.

(3.23)

AC

Similar with the argument as (3.2), we have 2uT Jf (xk )dk (θ¯k )/ k d(dk (θ¯k )) k θ¯k k dk (θ¯k ) k≤ − , ∀k ∈ K, u ∈ U (xk ). λmin (Bk )

(3.24)

This along with Bk = bI and {xk }k∈K → x∗ , means that {θ¯k k dk (θ¯k ) k}k∈K is bounded.

Then, {k dk (θ¯k ) k}k∈K → 0 as {{θ¯k }k∈K → ∞. Further, similar with the proof of (3.3), we have lim inf θ¯k k dk (θ¯k ) k> 0,

k∈K, k→∞

14

(3.25)

ACCEPTED MANUSCRIPT

which together with a similar argument as for proving (3.10) leads to k dk (θ¯k ) k= O(θ¯k dk (θ¯k )T Bk dk (θ¯k )) as θ → ∞, k ∈ K → ∞.

(3.26)

This result along with a similar proof of (3.1) in Lemma 3.1, we have that max

[k−M ]+ ≤i≤k

f (xi ) + σJf (xk )dk (θ¯k ),

(3.27)

CR IP T

f (xk + dk (θ¯k )) 

holds for sufficiently large k ∈ K. This implies that for all u ∈ U , uT f (xk + dk (θ¯k )) ≤

max +

[k−M ] ≤i≤k

uT f (xi ) + σuT Jf (xk )dk (θ¯k ),

holds for sufficiently large k ∈ K, which contradicts (3.23).

Therefore, {θk }k∈K is

AN US

bounded.

(3.28)

Finally, since dk = dk (θk ) is an optimal solution to (2.4) with x = xk and B = θk Bk , q P there exists a positive integer q, uj ∈ U and λj > 0, j = 1, · · · , q with λj = 1 such

that

λj Jf (xk )T uj + θk Bk dk + 2γdk + NS (xk + dk )

M

0∈

q X j=1

(3.29) (3.30)

ED

γ(1− k dk k2 ) = 0, γ ≥ 0, 1− k dk k2 ≥ 0

j=1

By taking limits on both sides of (3.29) and (3.30) as k ∈ K → ∞, and using semiconti-

PT

nuity of of NS (·) (see Theorem 24.4 of Rockafellar, 1970), the boundedness of θk , and the

CE

relation dk → 0 (Lemma 3.2), we see that 0∈

q X

λj Jf (x∗ )T uj + NS (x∗ ),

(3.31)

j=1

AC

which means that x∗ is a stationary point of (1.1). This result contradicts the noncritical assumption about x∗ in the beginning of this proof. This contradiction implies that the

conclusion of this theorem is true. In the rest of this part, we will analyze the locally linear convergence rate of Algorithm

1 under the following assumption, which is a generation of the one for scalar optimization made by Tseng and Yun (2009). In the rest of this paper, we denote S¯ as the set of critical points of problem (1.1). 15

ACCEPTED MANUSCRIPT

Assumption 1 (a) S¯ = 6 ∅ and, for any minx∈S f (x)  ξ, there exist  > 0 and ω > 0 such that ¯ ≤ ω k dI (x) k whenever f (x)  ξ, k dI (x) k≤ , dist(x, S) where ξ ∈ Rm .

CR IP T

(b) There exists δ > 0 such that ¯ y ∈ S, ¯ f (x) 6= f (y). k x − y k≥ δ whenever x ∈ S,

Assumption 1 is a generation of Assumptions A and B given by Luo and Tseng (1993) for scalar optimization. Assumption 1(a) is a local Lipschitzian error bound assumption, i.e., the distance from x to S¯ is locally in the order of the norm of the residual at x.

AN US

This kind of error bounds has been extensively studied in literature of scalar optimization. When the function f is polyhedral (see Robinson, 1981), from Corollary given by Robinson (1981) Assumption 1(a) holds. Assumption 1(b) means that iso-cost surfaces of f attached to S¯ are “properly separated” (see Tseng and Yun, 2009). Assumption 1(b) holds whenever f has only a finite number of vector values on S¯ or whenever the

M

connected components of S¯ are properly separated from each other. Assumption 1 is used

ED

to prove the locally linear convergence rate of Algorithm 1. The following theorem shows that Algorithm 1 is asymptotic convergence. The proof of this theorem is inspired by the proof of a similar local convergence for the solution method of scalar optimization by

PT

Tseng and Yun (2009).

CE

Theorem 3.4 Suppose that {xk } is generated by Algorithm 1, Assumption 1 holds, and

the sequence {f (xk )} is C−bounded from below. Then, there exists some constant c ∈

AC

(0, 1) such that

f (xl(k) ) − f ∗  c(f (xl(l(k)−1) ) − f ∗ ),

(3.32)

where l(k) is defined as in Lemma 3.2, and f ∗ = lim f (xk ). k→∞

Proof. From the proof of Theorem 3.3, {θk } is bounded. This together with Lemma

3.1 and the specific choice of θk implies that θˆ := sup θk < ∞. Then, according to k

bI 5 Bk 5 ¯bI and θk ≥ θ, we have θbI 5 Bk (θk ) 5 θˆ¯bI, where Bk (θk ) := θk Bk . Using this 16

ACCEPTED MANUSCRIPT

relation, Lemma 3.2 of Tseng and Yun (2009), bI 5 Bk , and dk = dBk (θk ) (xk ), we derive that k dI (xk ) k= O(k dk k),

(3.33)

which along with Lemma 3.2 means {dI (xk )} → 0. Hence, for any  > 0, there exists some ¯ This result together with Assumption index k¯ such that k dI (xl(k)−1 ) k≤  for all k ≥ k.

CR IP T

0 ¯ and some 1(a) and (3.33) implies that there exists some index k , some x¯l(k)−1 ∈ S,

constant c1 > 0 such that

0

k xl(k)−1 − x¯l(k)−1 k≤ c1 k dl(k)−1 k, ∀k ≥ k . Since {dk } → 0 and

[k−1]+

l(k+1)−2

−x

l(k)−1

k≤

X

i

k d k≤

X

k di k,

AN US

kx

l(k+1)−1

i=l(k)−1

(3.34)

i=[k−M −1]+

(3.35)

we have that k xl(k+1)−1 − xl(k)−1 k→ 0. Then, from this result, (3.34), and Lemma 3.2, the following inequalities hold,

M

k x¯l(k+1)−1 − x¯l(k)−1 k≤k x¯l(k+1)−1 − xl(k+1)−1 k + k xl(k)−1 − x¯l(k)−1 k + k xl(k+1)−1 − x¯l(k)−1 k (3.36)

ED

≤ c1 k dl(k+1)−1 k +c1 k dl(k)−1 k + k xl(k+1)−1 − x¯l(k)−1 k→ 0.

0 According to (3.36) and Assumption 1(b), there exists an index kˆ ≥ k and f¯ ∈ Rm such

PT

that

ˆ f (¯ xl(k)−1 ) = f¯, ∀k ≥ k.

(3.37)

CE

Then, similar with the proof of Lemma 5.1 of Tseng and Yun (2009), we can prove that f ∗ = lim f (xk ) = lim inf f (xl(k)−1 )  f¯. k→∞

k→∞

(3.38)

AC

Further, for any u ∈ C ∗ , there exists L and x¯k lying on the segment joining xl(k) with

x¯l(k)−1 such that for k ≥ kˆ

 uT (f (xl(k) − f¯) = uT (f (xl(k) − f (xl(k)−1 ) = uT Jf (¯ xk )(xl(k) − x¯l(k)−1 )  = uT Jf (¯ xk ) − J(xl(k)−1 ) (xl(k) − x¯l(k)−1 ) − (Bl(k)−1 (θl(k)−1 )dl(k)−1 )T (xl(k) − x¯l(k)−1 ) h i  l(k)−1 l(k)−1 T l(k) l(k)−1 + J(x )u + Bl(k)−1 (θl(k)−1 )d (x − x¯ )

≤ L k x¯k − xl(k)−1 kk xl(k) − x¯l(k)−1 k +θˆ¯b k dl(k)−1 kk xl(k) − x¯l(k)−1 k, 17

(3.39)

ACCEPTED MANUSCRIPT

where the first term in the inequality of (3.39) comes from the assumption about f , the second one comes from the non-positivity of the third term of the third equality and ˆ Bk (θk ) 5 θˆ¯bI. It comes from the choice of x¯k and (3.34) that, for k ≥ k, k x¯k − xl(k)−1 k≤k xl(k) − x¯l(k)−1 k

This result together with (3.39), Bk (θk ) ≥ θbI, and

(3.40)

CR IP T

≤k xl(k) − xl(k)−1 k + k xl(k)−1 − x¯l(k)−1 k≤ (1 + c1 ) k dl(k)−1 k .

−uT Jf (xl(k)−1 )dl(k)−1 ≥ (dl(k)−1 )T Bdl(k)−1 ≥ λmin (B) k dl(k)−1 k2 , ∀u ∈ C ∗ , implies that for k ≥ kˆ and u ∈ C ∗ , there exists some constant c2 > 0 such that,

AN US

uT (f (xl(k) ) − v) ≤ −c2 uT Jf (xl(k)−1 )dl(k)−1 .

(3.41)

ˆ From (3.19) and (3.41), we have that for c3 := c2 /σ and k ≥ k,

 uT (f (xl(k) ) − v) ≤ c3 uT f (xl(l(k)−1) − f (xl(k) )) , ∀u ∈ C ∗ .

(3.42)

M

Taking limits on both sides of (3.42) and using lim f (xl(k) ) = f ∗ , we have that uT f ∗ ≤ k→∞

uT v, ∀u ∈ C ∗ , which along with (3.38) means that v = f ∗ . According to this result and

ED

upon rearranging terms of (3.42), we get the following inequality, (3.43)

PT

ˆ ∀u ∈ C ∗ uT (f (xl(k) ) − f ∗ ) ≤ cuT (f (xl(l(k)−1) ) − f ∗ ), ∀k ≥ k,

where c := c3 /(1 + c3 ) ∈ (0, 1). This result obviously leads to the conclusion of this

Algorithm 2

AC

3.2

CE

theorem.

We next propose the second nonmonotone gradient method for (1.1) as follows. Algorithm 2: Step 0. Initialization: Choose parameters η ∈ (0, 1), σ ∈ (0, 1), 0 < α < α ¯ , 0 < b ≤ ¯b,

and integer M ≥ 0. Let x0 ∈ S be the initial vector. Set k := 0;

Step 1. Choose bI 5 Bk 5 ¯bI. Solve Problem (2.2) with x := xk . If dBk (xk ) = 0, then

stop and xk is a critical point; Otherwise, obtain dk := dBk (xk ). 18

ACCEPTED MANUSCRIPT

Step 2. Choose αk0 ∈ [α, α ¯ ]. Find the smallest integer j ≥ 0 such that αk := αk0 η j satisfies f (xk + αk dk ) 

max

[k−M ]+ ≤i≤k

f (xi ) + σαk Jf (xk )dk , xk + αk dk ∈ S.

(3.44)

Step 3: Updated: Let xk+1 := xk + αk dk . Set k := k + 1 and turn to Step 1. Remark Algorithm 2 is different from Algorithm 1 in two aspects. Bk can be arbitrar-

CR IP T

ily chosen satisfying certain constraints, i.e., bI 5 Bk 5 ¯bI, while the one in Algorithm 1 needs to update. The other difference is that the step-size αk satisfying (3.44) is used in the iteration of Algorithm 2, while the step-size in Algorithm 1 is always supposed to be 1. Note that Algorithm 2 is closely related to the one presented by Tseng and Yun (2009). When f reduces to a scalar-value function and M = 0, Algorithm 2 reduces to a gradient

AN US

descent method. As Algorithm 2 is generally a nonmonotone method when M ≥ 1 and aims to solve a vector optimization problem, most proofs of global and local convergence for the method given by Tseng and Yun (2009) do not hold for Algorithm 2. In addition, Algorithm 2 can be viewed as an extension of one projected gradient method proposed by Bello Cruz, Lucambio P´ erez, and Melo (2011). When M = 0, the term

max

[k−M ]+ ≤i≤k

f (xi )

M

reduces to f (xk ) and Algorithm 2 reduces to a monotone method. Just as Algorithm 1, in this method we also use the nonmonotone method, i.e., M ≥ 1. Because that the

ED

nonmonotone technique is helpful in overcoming the case where the sequence of iterates is to follow the bottom of curved narrow valleys (a common occurrence in difficult nonlinear

PT

problems) (see Toint, 1996).

CE

The following result shows the well-definedness of Algorithm 2. Lemma 3.5 If xk in noncritical to (1.1), there exists α e > 0 such that dk := dBk (xk )

AC

satisfies (3.44) whenever αk ∈ (0, α e].

Proof. Similar with the proof of Lemma 2.1 of Tseng and Yun (2009), we can prove that f (xk + αk dk )  f (xk ) + αJf (xk )dk + o(α) 

max

[k−M ]+ ≤i≤k

f (xi ) + αJf (xk )dk + o(α), ∀α ∈ (0, 1].

(3.45) (3.46)

This result together with Jf (xk )dk ≺ 0 (since xk is noncritical) implies that the assertion holds. 19

ACCEPTED MANUSCRIPT

3.2.1

Convergence

To prove the global and local convergent results, we need the following technical results. We prove that {αk dk } converges to zero and the sequence of objective values {f (xk )} also converges, i.e., lim dk = 0 and lim f (xk ) = f ∗ for some f ∗ ∈ Rm . k→∞

k→∞

Lemma 3.6 Suppose that the sequences {dk } and {xk } are generated by Algorithm 2 and lim αk dk = 0 and lim f (xk ) = f ∗ for some given f ∗ ∈ Rm . k→

k→∞

CR IP T

the sequence {f (xk )} is C−bounded from below. Then, the following two limits hold, Proof. From the definition of dk and Bk = bI, for any u ∈ C ∗ , we have

AN US

1 1 uT Jf (xk )dk ≤ − (dk )T Bk dk ≤ − b k dk k . 2 2

(3.47)

This result along with the relation αk ≤ αk0 ≤ α ¯ , means that αk2 k dk k2 ≤ −2¯ α αk

uT Jf (xk )dk , ∀u ∈ C ∗ . b

(3.48)

Define l(k) as in the proof of Lemma 3.2. Then, similar with the proof of (3.17), we can

for all j ≥ 1,

M

prove that {xk } satisfies (3.17) for some f ∗ . We next show the following two limits hold lim αl(k)−j dl(k)−j = 0, lim f (xl(k)−j ) = f ∗ . k→∞

ED

k→∞

(3.49)

We prove it by induction. We first replace k with l(k) − 1 in (3.44) and have that

PT

f (xl(k) )  f (xl(l(k)−1) ) + σαl(k)−1 Jf (xl(k)−1 )dl(k)−1 ,

(3.50)

which together with (3.17) and (3.48) implies that lim αl(k)−1 Jf (xl(k)−1 )dl(k)−1 = 0. From k→∞

CE

this result and (3.48) again, we have that the first limit of (3.49) holds for j = 1. From this result, (3.17) and the assumption about f , it is easy to show that the second limit

AC

of (3.49) holds for j = 1. We next show that if (3.49) holds for j, then it also holds for j + 1. From (3.44), we have that f (xl(k)−j )  f (xl(l(k)−j−1) ) + σαl(k)−j−1 Jf (xl(k)−j−1 )dl(k)−j−1 ,

(3.51)

which along with (3.17) and the induction assumption that lim f (xl(k)−j ) = f ∗ , implies k→∞

lim αl(k)−j−1 Jf (x

k→∞

l(k)−j−1

)d

l(k)−j−1

= 0. From this result and (3.48), we get

lim αl(k)−j−1 dl(k)−j−1 = 0,

k→∞

20

ACCEPTED MANUSCRIPT

which together with the induction assumption lim f (xl(k)−j ) = f ∗ and the assumption k→∞

about {f (xk )}, yields that lim f (xl(k)−j−1 ) = f ∗ . Therefore, (3.49) holds for j + 1. The k→∞

assertion of this lemma then follows from (3.49) and a similar proof as that in Lemma 3.2. We next prove that Algorithm 2 is globally convergent.

CR IP T

Theorem 3.7 Suppose that the sequence {xk } is generated by Algorithm 2 and the se-

quence {f (xk )} is C−bounded from below. Then, any accumulation point of the sequence {xk } is a critical point of (1.1).

Proof. We prove this theorem by contradiction, i.e., suppose that there is an accumulation

AN US

point x∗ of {xk } which is a noncritical point of (1.1). We denote K as the subsequence such

that {xk }k∈K → x∗ . We first show that the following inequality holds, lim inf k dk k> 0. k∈K,k→∞

If it is not true, without loss of generality, we can assume that {dk }k∈K → 0. Since dk is

an optimal solution to (2.4) with x = xk and B = Bk , there exists a positive integer q, q P λj = 1 such that uj ∈ U and λj > 0, j = 1, · · · , q with λj Jf (xk )T uj + Bk dk + 2γdk + NS (xk + dk )

(3.52)

j=1

ED

0∈

q X

M

j=1

γ(1− k dk ) k2 ) = 0, γ ≥ 0, 1− k dk k2 ≥ 0

(3.53)

PT

By taking limits on both sides of (3.52) and (3.53) as k ∈ K → ∞, and using semicontinu-

ity of of NS (·) (see Theorem 24.4 of Rockafellar, 1970), and the relation dk → 0 (Lemma

AC

CE

3.2), we see that

0∈

q X

λj Jf (x∗ )T uj + NS (x∗ ),

(3.54)

j=1

which means that x∗ is a stationary point of (1.1). This result contradicts the noncritical

assumption about x∗ in the beginning of this proof. This contradiction implies that lim inf k dk k> 0 holds. Similar with the proof of (3.2) in Lemma 3.1, we obtain,

k∈K,k→∞

k dk k≤ −

2uT Jf (xk )dk / k dk k , ∀k ∈ K, ∀u ∈ C ∗ . λmin (Bk ) 21

(3.55)

ACCEPTED MANUSCRIPT

From (3.55), {xk }k∈K → x∗ , Bk = bI and lim inf k dk k> 0, we have the conclusion k∈K,k→∞

that {dk }k∈K is bounded. Further, from (3.48), we have that lim sup uT Jf (xk )dk < 0, k∈K, k→∞

∀u ∈ C ∗ . This result together with Lemma 3.6 implies that {αk }k∈K → 0. Then, from

the specifical choice of αk , there exists some index k¯ such that αk < αk0 and αk < η for all ¯ Define α k ∈ K with k ≥ k. ¯ k :=

αk . η

Then, {¯ αk }k∈K → 0 and 0 < α ¯ k ≤ 1 for all k ∈ K.

exists u¯ ∈ C ∗ such that

u¯T f (xk + α ¯ k dk ) >

max

[k−M ]+ ≤i≤k

CR IP T

¯ there From the choice of αk in Algorithm 2, we obtain that, for all k ∈ K with k ≥ k,

u¯T f (xi ) + σ¯ uT α ¯ k Jf (xk )dk .

(3.56)

On the other hand, by the assumption of f , lim sup uT Jf (xk )dk < 0, ∀u ∈ C ∗ , and the k∈K, k→∞

boundedness of {d }k∈K , we obtain that, for sufficiently large k ∈ K and ∀u ∈ C ∗ ,

AN US

k

uT f (xk + α ¯ k dk ) = uT f (xk ) + α ¯ k uT Jf (xk )dk + o(¯ αk k dk k) ≤

max

[k−M ]+ ≤i≤k

u¯T f (xi ) + α ¯ k uT Jf (xk )dk + o(¯ αk k dk k) (3.57)

M

< +¯ αk uT Jf (xk )dk + o(¯ αk k dk k) + σ α ¯ k uT Jf (xk )dk .

ED

(3.57) clearly contradicts (3.56). The contradiction leads to the conclusion of this theorem.

We next show the local linear rate of convergence of Algorithm 2.

PT

Theorem 3.8 Suppose that {xk } is generated by Algorithm 2, Assumption 1 holds, and

the sequence {f (xk )} is C−bounded from below. Then, there exists some constant c ∈

CE

(0, 1) such that

f (xl(k) ) − f ∗  c(f (xl(l(k)−1) ) − f ∗ ),

(3.58)

AC

where l(k) is defined as in Lemma 3.2, and f ∗ = lim f (xk ). k→∞

Proof. From the specific choice of αk and Lemma 3.6 of Lu and Zhang (2012), we see that inf αk > 0. This result along with Lemma 3.6 means that {dk } → 0. Similar with k

the proof of (3.33), the following equality holds, k dI (xk ) k= O(k dk k). 22

(3.59)

ACCEPTED MANUSCRIPT

Hence, {dkI } → 0 and by a similar argument as that in the proof of Theorem 3.4, there ˆ v ∈ Rm , and x¯l(k)−1 ∈ S¯ such that are c1 > 0, some index k,

ˆ k xl(k)−1 − x¯l(k)−1 k≤ c1 k dl(k)−1 k, f (¯ xl(k)−1 ) = v, ∀k ≥ k.

(3.60)

Then, similar with the proof of (3.38), we can prove that (3.38) holds for {xk }, and f ∗

CR IP T

and v. According to the assumption about f , and bI 5 Bk 5 bI, we have, for any u ∈ C ∗ ,

there exists L and x¯k lying on the segment joining xl(k) with x¯l(k)−1 such that for k ≥ kˆ

AN US

 uT (f (xl(k) − f¯) = uT (f (xl(k) − f (xl(k)−1 ) = uT Jf (¯ xk )(xl(k) − x¯l(k)−1 )  xk ) − J(xl(k)−1 ) (xl(k) − x¯l(k)−1 ) − (Bl(k)−1 dl(k)−1 )T (xl(k) − x¯l(k)−1 ) = uT Jf (¯ h i T + J(xl(k)−1 )u + Bl(k)−1 dl(k)−1 (xl(k) − x¯l(k)−1 ) ≤ L k x¯k − xl(k)−1 kk xl(k) − x¯l(k)−1 k +¯b k dl(k)−1 kk xl(k) − x¯l(k)−1 k   +(αl(k)−1 − 1) (dl(k)−1 )T Bl(k)−1 dl(k)−1 + uT Jf (xl(k)−1 )dl(k)−1 .

(3.61)

ˆ From the choice of x¯k , (3.34) and αk ≤ 1, we have, for k ≥ k,

M

k x¯k − xl(k)−1 k≤k xl(k) − xl(k)−1 k + k xl(k)−1 − l(k)¯− 1 k≤ (1 + c1 ) k dl(k)−1 k . (3.62)

ED

ˆ k xl(k) − x¯l(k)−1 k≤ (1 + c1 ) k dl(k)−1 k. Then, it follows from these Similarly, for k ≥ k, ˆ for any u ∈ C ∗ , inequalities, Bk = bI, αk ≤ 1, and (3.61), we have that, for k ≥ k,

PT

uT (f (xl(k) ) − f¯) ≤ −c2 uT Jf (xl(k)−1 )dl(k)−1 ,

CE

for some constant c2 > 0. The remaining proof follows similarly as that of Theorem 3.4.

In the rest of this section, we discuss the convergence of Algorithm 2 for solving a

AC

m special class of vector optimization with C := R+ without Assumption 1.

Theorem 3.9 Suppose that the following conditions hold, m 1. C := R+ ;

2. Bk := I, ∀k in Algorithm 2; 3. M := 0 in Algorithm 2. 23

ACCEPTED MANUSCRIPT

Then we have that, for all x ∈ S and the sequence {xk } generated by Algorithm 2, there m P exists {ukj }m ⊂ [0, 1] satisfying ukj = 1, and j=1 j=1

kx

k+1

2

k

2

− x k − k x − x k ≤ 2αk

m X j=1

ukj ∇fj (xk )T (x

m 2X (fj (xk ) − fj (xk+1 )). −x )+ σ j=1 k

CR IP T

m Proof. When C = R+ and Bk = I, the subproblem at k−th iteration in Algorithm 2

reduces to min d

max

j∈{1,··· ,m}

1 k d k2 , s.t., k d k≤ 1, xk + d ∈ S. 2

∇fj (xk )T d +

(3.63)

Because the objective function of the above problem is strongly convex, the optimal

AN US

solution dk satisfies the following inequality,

(d − dk )T v k ≥ 0, ∀ k d k≤ 1, xk + d ∈ S, where v k ∈ ∂φk (dk ) and φk (d) :=

max

j∈{1,··· ,m}

1 2

∇fj (xk )T d +

k

M

definition of φk , there exist ukj ≥ 0, j = 1, · · · , m, such that k

v =d +

m X

ED

j=1

(3.64)

k d k2 . According to the

m P

j=1

ukj = 1 and

ukj ∇fj (xk ).

(3.65)

By simple computation, ∀x ∈ S, define

PT

γk :=k xk − x k2 − k xk+1 − x k2 + k xk+1 − xk k2 = 2(xk+1 − xk )T (x − xk ).

CE

This together with xk+1 = xk + αk dk implies that γk = 2αk (x − xk )T dk = 2αk (x − xk − dk )T dk + 2αk k dk k2 .

(3.66)

AC

From (3.65) and (3.66), k

2

k

k T

k

γk − 2αk k d k = 2αk (x − x − d ) (v − = 2αk ((x − xk − dk )T v k − (x − xk − dk )T ≥ 2αk

(xk − x)T

m X j=1

ukj ∇fj (xk ) + (dk )T 24

m X

j=1 m X j=1

m X j=1

ukj ∇fj (xk ))

ukj ∇fj (xk )) !

ukj ∇fj (xk ) ,

(3.67)

ACCEPTED MANUSCRIPT

where the inequality comes from (3.64). The above inequality implies that k xk − x k2 − k xk+1 − x k2 +αk (αk − 2) k dk k2 = γk − 2αk k dk k2 ! m m X X ≥ 2αk (xk − x)T ukj ∇fj (xk ) + (dk )T ukj ∇fj (xk ) . j=1

(3.68)

j=1

CR IP T

(3.68) together with (3.44) and M = 0 means that m m X 2X k k k k+1 2 k 2 k T uj ∇fj (x ) + uj (fj (xk ) − fj (xk+1 )). kx − x k − k x − x k ≤ 2αk (x − x ) σ j=1 j=1

(3.69)

It follows from f (xk+1 )  f (xk ) and ukj ≥ 0, ∀j, k, that m X j=1

ukj (fj (xk ) − fj (xk+1 )) ≤

m X j=1

(fj (xk ) − fj (xk+1 )).

(3.70)

4

AN US

Therefore, it follows from (3.69) and (3.70) that the assertion holds.

Multiobjective Portfolio Optimization

portfolio optimization problems.

Models

ED

4.1

M

In this section, we discuss an application of the proposed algorithms to mutli-criteria

Suppose that a capital market has a set of risky assets each with uncertain returns.

CE

j=1

PT

Let xj ∈ R+ , j = 1, · · · , n denote the amount invested in the different asset, where n P xj = 1 and xj ≥ 0, j = 1, · · · , n means that short selling is not permitted. The

stochastic returns of the investments x := (x1 , · · · , xn )T ∈ Rn in the assets are given

AC

by r(ω) := (r1 (ω), · · · , rn (ω))T : Ω → Rn , where Ω is the known support. Then, the n P e portfolio’s return is the random variable: R(x) := xj rj (ω), ω ∈ Ω. The mean-variance j=1

model can then be presented for selecting a portfolio by maximizing the expected return as e well as minimizing the variance of R(x) subject to some constraints, that is, the following bicriteria optimization problem,

e e min f (x) := (−E[R(x)], σ 2 (R(x))) n P s.t. xj = 1, xj ≥ 0, j = 1, · · · , n x∈Rn

j=1

25

(4.1)

ACCEPTED MANUSCRIPT

2 e e e e where E[R(x)] and σ 2 (R(x)) := E[(R(x) − E[R(x)]) ] are the expectation and variance

e e of R(x) respectively. If we know the portfolio return R(x) := x1 R1 + · · · + xm Rm from m m P P e choice x := (x1 , · · · , xm ), then σ 2 (R(x)) = xi xj σij , where σij is the covariance of i=1 j=1

Ri and Rj , and thus variance is expressed as a quadratic function of x1 , · · · , xm . Since

e R is a linear function about x, so the variance σ 2 (R(x)) is convex about x, making the

4.2

Numerical Tests

CR IP T

above bicriteria optimization problem convex.

In our numerical tests, we use the following procedure to solve Problem (2.2). Given a

T

max∗ u f (x), s.t., u∈C

m X i=1

AN US

point x ∈ S, solve the following linear optimization problem,

ui = 1, ui ≥ 0, i = 1, · · · , m.

(4.2)

Let U (x) be the solution set of the above problem for the given x ∈ S. Given a positive definite matrix B ∈ Rn×n and a sufficient small  > 0, we denote by dB (x) the solution of

M

the following problem if there is an optimal solution:

ED

1 min β + dT Bd, s.t., uT Jf (x)d ≤ β, u ∈ U (x), k d k≤ 1, β ≤ −, x + d ∈ S. (4.3) β, d 2 Problem (4.3) is denoted as SubB (x, ). The constraint β ≤ − is used as a termination rule, i.e., for given x ∈ S, when there is no direction d such that β ≤ −, we have x as

PT

an approximate solution to the critical point. Different from Lemma 2.4, the following

CE

lemma provides another alternative characterization of criticality. Lemma 4.1 For any positive definite matrix B ∈ Rn×n , x ∈ S is a critical point of

AC

Problem (1.1) if and only if there is no solution to (4.3) for any given  > 0. Proof. From Definition 2.2, if x ∈ S is not a critical point, then there is a direction d ∈ Rn

such that Jf (x)d ∈ (−int(C)), i.e., Jf (x)d ≺ 0. This implies that if x ∈ S is a critical point of problem (1.1), then for any d ∈ Rn and u ∈ C ∗ , uT Jf (x)d ≥ 0. Therefore, the

feasible solution set of Problem (4.3) is empty and there is no solution to (4.3). Similarly, if there is no solution to Problem (2.4) for any given  > 0, we have that x is a critical point of Problem (1.1). 26

ACCEPTED MANUSCRIPT

For the simplicity, we denote by SubB (x, ) for given  ≥ 0 both the problem and its optimal value (with the convention that SubB (x, ) = +∞ if the problem is infeasible). If  = 0 in Problem (4.3), then its feasible solution set is always nonempty since d = 0 is always a feasible solution. When  > 0, (4.3) maybe infeasible and from Lemma 4.1 x ∈ S is an approximate critical point of Problem (1.1) when it is infeasible. Further, we

CR IP T

have the following conclusions. Lemma 4.2 For any given positive definite matrix B, the following conclusions are true: (i) for any given noncritical point x, the feasible set of Problem (4.3) is nonempty and closed for any  > 0;

(ii) for any given noncritical point x, SubB (x, ) converges to SubB (x, 0) as  → 0;

AN US

(iii) for any given  > 0, SubB (·, ) is noncontinuous with respect to the critical point x. Proof. (i) The conclusion follows from Lemma 4.1 directly.

(ii) Because the feasible region of SubB (x, ) is a subset of the feasible region of SubB (x, 0), hence, SubB (x, ) ≥ SubB (x, 0) for all  > 0. This implies that we only

M

have to show that

ED

∀δ > 0, ∃e  > 0, such that SubB (x, ) ≤ SubB (x, 0) + δ, ∀ ∈ (0, e ].

It is obvious that this is true from the construction of the problem. Therefore, the

PT

conclusion is true.

(iii) From Lemma 4.1, the feasible set of (4.3) is empty at the critical point as  > 0.

CE

Therefore, the assertion is true. Next, we present the numerical applications for Algorithms 1 and 2. The codes are writ-

AC

ten in Matlab 7.0 and the tests are conducted on a DELL computer with Intel(R)Core(TM)i52400 processor (3.10 GHz+3.10 GHz) and 4.00 GB of memory. Let the sample space Ω := {ω1 , · · · , ωL } and we suppose that pj = P{ωj ∈ Ω} = 1/L, j = 1, · · · , L. We assume that the random variable ω is uniformly distributed on Ω := [0, 1]. We assume L = 200 in the following numerical tests. The parameters in Algorithm 1 are set as follows: 27

ACCEPTED MANUSCRIPT

2000

1000 M=10 M=0

1400

700

1200

600

1000 800

500 400

600

300

400

200

200

100 0

100

200

n

300

400

0

500

0

100

CR IP T

800

Iteration

1600

0

M=10 M=0

900

200

n

300

400

500

AN US

CPU Time

1800

Figure 1: CPU Time of Algorithm 1

Figure 2: Iterations of Algorithm 1 with M = 0 and M = 10

ED

M

with M = 0 and M = 10

300

50

M=10 M=0

Iteration

150

35

CE

CPU Time

200

100

AC

50

0

40

PT

250

0

100

200

M=10 M=0

45

30 25 20 15 10

n

300

400

500

5

0

100

200

n

300

400

Figure 3: CPU Time of Algorithm 2

Figure 4: Iterations of Algorithm 2

with M = 0 and M = 10

with M = 0 and M = 10

28

500

ACCEPTED MANUSCRIPT

1800

700 Algorithm 1 Algorithm 2

1600

Algorithm 1 Algorithm 2 600

1400 500

Iteration

CPU Time

1200 1000 800

400

300

200 400 100

200 0

0

100

200

300

n

400

500

0

0

100

CR IP T

600

200

n

300

400

500

AN US

Figure 5: CPU Time of Algorithms 1 and 2 Figure 6: Iterations of Algorithms 1 and 2 η = 1.5, σ = 0.5, b = 1, ¯b = 2, b = 1, ¯b = 2, and integer M = 10, θ00 = 1 and B0 = I, while the parameters in Algorithm 2 are set as follows η = 0.5, σ = 0.5, α = b = 0.5, α ¯ = ¯b = 1.5, B0 = I, M = 10. We choose  = 10−5 in our numerical tests. We now solve bi-objective optimization problem (4.1) by Algorithm 1 and Algorithm

M

2 with the initial point randomly generated from the feasible set of (4.1) with n = 5, 10, 50, 100, 150, 200, 300, 500. To show the efficiency of nonmonotone gradient methods,

ED

we compare the performance of Algorithms 1 and 2 with monotone gradient methods, that is M = 0 in Algorithms 1 and 2. The number of iterations and computing CPU

PT

time in seconds are reported in Figures 1-4. Generally, Algorithms 1 and 2 in this paper outperforms monotone gradient methods on both the number of iterations and CPU

CE

time. The results suggest that our methods are efficient for all the tested problems and the theoretical assertions about the new algorithms are verified.

AC

We also compare the performance of Algorithms 1 and 2 with M = 10. The number of iterations and computing CPU time in seconds are reported in Figures 5 and 6, respectively. It can be found that Algorithm 1 outperforms Algorithm 2 on both the number of iterations and CPU time. For the choice of the parameter M , to the best of our knowledge, there is no theoretic results about how to choose it. As recommended by Grippo, Lampariello and Lucidi (1986), M = 10 is often set (Zhang and Hager, 2004). To explore the dependence on the parame29

ACCEPTED MANUSCRIPT

250

600 Algorithm 1 Algorithm 2

Algorithm 1 Algorithm 2 500

200

Iteration

CPU Time

400 150

100

300

50

0

100

0

5

10

15

M

20

25

0

30

0

5

CR IP T

200

10

15

M

20

25

30

Figure 7: CPU Time of Algorithms 1 and 2 Figure 8: Iterations of Algorithms 1 and 2 with different choices of M

AN US

with different choices of M

ter M for our methods, we solve (4.1) with n = 50 and M = 0, 1, 5, 10, 15, 20, 25, 30. The number of computing CPU time for Algorithms 1 and 2 is reported in Figure 7. The number of iterations for Algorithms 1 and 2 is reported in Figure 8. It can be shown that

5

Conclusion

ED

M

quite satisfactory results can be obtained for 5 ≤ M ≤ 15.

PT

In this paper, we study a class of vector optimization with C-convex objective function. Two nonmonotone gradient algorithms are presented. The global and local convergence

CE

of the proposed algorithms are proved. To show the efficiency of the proposed algorithm, applications to portfolio optimization under bi-criteria considerations are given. Further studies of the proposed algorithms can be extended to more practical applications under

AC

multi-criteria considerations. Acknowledgements. The work is supported by The Program for Professor of Spe-

cial Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. TP2014043). The work is also supported by Natural Scientific Foundation of China (No. 71571055). We thank the referees for the careful reading of the manuscript and constructive suggestions.

30

ACCEPTED MANUSCRIPT

References [1] BELLO CRUZ, J.Y. A subgradient method for vector optimization problems. SIAM J. Optim. 2013, 23(4), 2169-2182. ´ [2] BELLO CRUZ, J.Y. LUCAMBIO PEREZ, L.R. MELO, J.G. Convergence of the

CR IP T

projected gradient method for quasiconvex multiobjective optimization. Nonlinear Anal. 2011, 74, 5268-5273.

[3] BENTO, G.C. CRUZ NETO, J.X. SOUBEYRAN, A. A proximal point-type method for multicriteria optimization. Set-Valued Var. Anal. 2014, 22, 557-573.

AN US

[4] BONNEL, H. IUSEM, A.N. SVAITER, B.F. Proximal methods in vector optimization. SIAM J. Optim. 2005, 15, 935-970.

[5] BRITOA, A.S. CRUZ NETO, J.X. SANTOS, P.S.M. SOUZA, S.S. A relaxed projection method for solving multiobjective optimization problems. Eur. J. Oper. Res.

M

2017, 1256, 17-23.

[6] CARRIZO, G.A. LOTITO, P.A. MACIEL, M.C. Trust region globalization strategy

ED

for the nonconvex unconstrained multiobjective optimization problem. Math. Program. Ser. A 2016, 159, 339-369.

CE

330.

PT

[7] DAI, Y.H. On the nonmonotone line search. J. Optim. Theory Appl. 2002, 112, 315-

[8] EHRGOTT, M. Multicriteria optimization, Lecture Notes in Economics and Math-

AC

ematical Systems 491. Springer-Verlag, Berlin, 2000. [9] FLIEGE, J. GRANA DRUMMOND, L.M. SVAITER, B.F. Newton’s method for multiobjective optimization. SIAM J. Optim. 2009, 20(2), 602-626.

[10] FLIEGE, J. SVAITER, B.F. Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 2000, 51, 479-494.

31

ACCEPTED MANUSCRIPT

[11] FUKUDA, E.H. GRA˜ nA DRUMMOND, L.M. On the convergence of the projected gradient method for vector optimization. Optim. 2011,60(8-9), 1009-1021. [12] FUKUDA, E.H. GRA˜ nA DRUMMOND, L.M. Inexact projected gradient method for vector optimization. Comput. Optim. Appl. 2013, 54, 473-493.

CR IP T

[13] GRA˜ nA DRUMMOND, L.M. IUSEM, A.N. A projected gradient method for vector optimization problems. Comput. Optim. Appl. 2004, 28, 5-29.

[14] GRIPPO, L. Lampariello, F. Lucidi S. A nonmonotone line search technique for Newtons method. SIAM J. Numer. Anal. 1986, 23, 707-716.

AN US

[15] HANDI, J. KELL, D.B. KNOWLES, J. Multiobjective optimization in bioinformatics and computational biology. IEEE ACM T. Comput. Bi. 2007, 4, 279-290. [16] IUSEM, A.N. SAVITER, B.F. TEBOULLE, M. Entropy-like proximal methods in convex programming. Math. Oper. Res. 1994, 19, 790-814.

M

[17] JAHN, J. Vector optimization: Theory, applications, and extensions. Spring, Erlangen, 2003.

ED

[18] JI Y, LI Y.J. ZHANG K.C. ZHANG X.L. A new nonmonotone trust-region method of conic model for solving unconstrained optimization. J. Comput. Appl. Math. 2010,

PT

233, 1746-1754

CE

[19] LU, Z.S. ZHANG, Y. An augmented Lagrangian approach for sparse principal component analysis. Math. Program. 2012, 135, 149-193.

AC

[20] LUO, Z.Q. TSENG, P. Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 1993, 46, 157-178.

[21] PARDALOS, P.M. HEARN, D. Multi-criteria decision analysis via ratio and difference judgement. Kluwer Academic Publishers, Dordrecht, 1999. [22] QU, S.J. GOH, M. CHAN, F.T.S. Quasi-Newton methods for solving multiobjective optimization. Oper. Res. Lett. 2011, 39, 397-399. 32

ACCEPTED MANUSCRIPT

[23] QU, S.J. GOH, M. JI, Y. SOUZA, R.D. A new algorithm for linearly constrained c-convex vector optimization with a supply chain network risk application. Eur. J. Oper. Res. 2015, 247: 359-365. [24] QU, S.J. GOH, M. SOUZA, R.D. WANG, T.N. Proximal point algorithms for convex

Theory Appl. 2014, 163(3), 949-956.

CR IP T

multi-criteria programs with applications to supply chain risk management. J. Optim.

[25] QU, S.J. LIU, C. GOH, M. LI, Y.J. JI, Y. Nonsmooth multiobjective programming with quasi-Newton methods. Eur. J. Oper. Res. 2014, 235(3), 503-510.

[26] QU, S.J. GOH, M. ZHANG X.J. A new hybridmethod for nonlinear complementarity

AN US

problems. Comput. Optim. Appl. 2011, 49, 493-520

[27] ROCKAFELLAR, R.T. Convex analysis. Princeton University Press, Princeton, 1970.

Prog. Study 1981, 14, 206-214.

M

[28] ROBINSON, S.M. Some continuity properties of polyhedral multifunctions. Math.

ED

[29] SHI, Z.J. WANG, S.Q. Nonmonotone adaptive trust region method. Eur. J. Oper. Res. 2011, 208, 28-36.

PT

[30] TOINT, P. An assessment of nonmonotone linesearch techniques for unconstrained

CE

optimization. SIAM J. Sci. Comp. (1996), 17, 725-739. [31] TSENG, P. YUN, S. A coordinate gradient descent method for nonsmooth separable

AC

minimization. Math. Program. 2009, 117, 387-423. [32] VIEIRA, D.A.G. TAKAHASHI, R.H.C. SALDANHA, R.R. Multicriteria optimization with a multiobjective golden section line search. Math. Program. Ser. A 2012,

131, 131-161.

[33] VILLACORTA, K.D.V. OLIVEIRA, P.R. An interior proximal method in vector optimization. Eur. J. Oper. Res. 2011, 214, 485-492.

33

ACCEPTED MANUSCRIPT

[34] ZHANG, H.C. HAGER, W.W. A nonmonotone line search technique and its appli-

AC

CE

PT

ED

M

AN US

CR IP T

cation to unconstrained optimization. SIAM J. Optim. 2004, 14(4), 1043-1056.

34