Accepted Manuscript
Nonmonotone Gradient Methods for Vector Optimization with a Portfolio Optimization Application Shaojian Qu, Ying Ji, Jianlin Jiang, Qingpu Zhang PII: DOI: Reference:
S0377-2217(17)30455-1 10.1016/j.ejor.2017.05.027 EOR 14455
To appear in:
European Journal of Operational Research
Received date: Revised date: Accepted date:
14 October 2015 10 February 2017 12 May 2017
Please cite this article as: Shaojian Qu, Ying Ji, Jianlin Jiang, Qingpu Zhang, Nonmonotone Gradient Methods for Vector Optimization with a Portfolio Optimization Application, European Journal of Operational Research (2017), doi: 10.1016/j.ejor.2017.05.027
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • Vector optimization is studied. • Two nonmonotone gradient algorithms are proposed for vector optimization.
CR IP T
• The global and local convergence results for the new algorithms are presented. • The efficiency of the new algorithm is shown by an application to a portfolio opti-
AC
CE
PT
ED
M
AN US
mization problem.
1
ACCEPTED MANUSCRIPT
Nonmonotone Gradient Methods for Vector
Application
CR IP T
Optimization with a Portfolio Optimization
Shaojian Qu1,2∗ Ying Ji1,2 Jianlin Jiang3 Qingpu Zhang4
1. Business School, University of Shanghai for Science and Technology,
AN US
516 Jungong Road, Shanghai 200093, P.R. China Tel: +86-15601980161
2. The Institute of Logistics-Asia Pacific, National University of Singapore 3. College of Science, Nanjing University of Aeronautics and Astronautics
M
4. School of Management, Harbin Institute of Technology
PT
ED
May 26, 2017
Abstract
CE
This paper proposes two nonmonotone gradient algorithms for a class of vector optimization problems with a C−convex objective function. We establish both the global and local convergence results for the new algorithms. We then apply the new
AC
algorithms to a portfolio optimization problem under multi-criteria considerations. Key words: (S) Multiple objective programming; Nonmonotone gradient algorithms; Pareto optimum; Convergence; Portfolio optimization
∗
Email address:
[email protected]
ACCEPTED MANUSCRIPT
1
Introduction
Vector-valued optimization stems from multi-objective programming, multi-criteria decision making, statistics, and cooperative game theory (see Jahn, 2003; Handi, Kell and Knowles, 2007). Such optimization problems have been extensively studied and applied in various decision-making contexts. Let C ⊆ Rm be a convex, closed and pointed
CR IP T
cone with int(C) 6= ∅. We define the partial order as follows, for any u, v ∈ Rm , u v ⇐⇒ v − u ∈ C. We also define the relation ≺ as u ≺ v ⇐⇒ v − u ∈ int(C). In this paper, we consider a class of C−convex vector optimization problems, min f (x) s.t. x ∈ S, C
(1.1)
AN US
where f : Rn → Rm is C−convex (see the definition in Section 2), and S ⊆ Rn is the constraint set, which is assumed to be closed and convex. Our goal is to propose two implementable nonmonotone gradient algorithms to find a critical point (see Definition 2.2) for (1.1). Note that “min” denotes the optimum with respect to the cone C. It is C
M
known that many practical problems can be cast in the format of (1.1). The interested reader is referred to Pardalos and Hearn (1999, Chap. 10) and Ehrgott (2000) for the
ED
detailed modelling in this regard.
Recently, several iterative methods for solving scalar optimization problems have been extended to solve multi-criteria optimization problems (see Villacorta and Oliveira, 2011;
PT
Bello Cruz, P´ erez, and Melo, 2011; Bonnel, Iusem, Svaiter, 2005; Qu, Goh, Souza and Wang, 2014; Bento, Cruz Neto, and Soubeyran, 2014; Iusem, Saviter, and Teboulle, 1994;
CE
Qu, Goh, Ji, and Souza, 2014; Britoa, Cruz Neto, Santos, and Souza, 2017; Carrizo, Lotito, and Maciel, 2016). Our study is in line with of these ideas. We propose a similar
AC
extension for the case of nonmonotone gradient methods for scalar convex optimization. Gradient-based methods are already widely used in the literature, which seek to obtain
a point satisfying the first-order conditions for Pareto optimality. Fliege and Svaiter (2000) have presented a steepest descent method for solving multiobjective optimisation problems that does not rely on the scalarisation approaches. At every iteration, this method has two features: (i) by an extended steepest descent method a direction is obtained in which feasible solutions that dominate the current point are chosen; (ii) to find a point that 3
ACCEPTED MANUSCRIPT
dominates the current one along this direction, a line search based on Armijo rule is conducted. Vieira et al. (2012) generalise this idea in which Armijo rule is replaced with a multiobjective golden section line search. Fliege et al. (2009) propose Newton methods for solving multiobjective convex optimization. Under the assumptions of twice continuously differentiable and strong local convexity, superlinear convergence is established. Qu et
CR IP T
al. (2011, 2014) generalize it to present Quasi-Newton methods for solving multiobjective optimization without convex assumption. Under normal assumption, the global and local convergence results are given.
To our knowledge, the gradient-based methods listed above are mostly taking monotone linear search and the monotonicity of the function values acts an crucial role in the
AN US
convergence proof. Our gradient-based algorithms are different from these methods and are taking nonmonotone linear search. Bello Cruz (2013) presents a subgradient method for vector optimization problems without using scalar-valued objectives, which does not need the monotonicity of the function values in the convergence analysis. The main difference between our method and that in Bello Cruz (2013) is that we take nonmonotone
M
line search specifically and Bello Cruz (2013) does not take line search. The primary contributions of this paper are as follows. We extend the nonmonotone
ED
gradient algorithm to a vector optimization setting and demonstrate the power of the new algorithm in solving vector optimization problems. We prove global convergence
PT
for the new algorithm and analyze its local linear convergence rate. We apply the new method to portfolio management, where conflicting multi-criteria considerations can affect
CE
performance.
This paper is thus organized as follows. Section 2 presents some preliminaries on the
AC
notations. Section 3 extends the nonmonotone gradient algorithm to a vector optimization setting and presents the convergence of the new algorithm. The convergence rate for the new algorithm is also presented in this section. An application to a portfolio management problem is presented in Section 4. Section 5 concludes.
4
ACCEPTED MANUSCRIPT
2
Preliminaries
In what follows, R is the set of real numbers, R+ denotes the set of non-negative real numbers and R++ is the set of strictly positive real numbers. Function f : Rn → Rm is C − convex iff for any x, y ∈ Rn and any λ ∈ [0, 1],
CR IP T
f (λx + (1 − λ)y) λf (x) + (1 − λ)f (y).
Function f is strictly C − convex iff for any x, y ∈ Rn and any λ ∈ (0, 1), f (λx + (1 − λ)y) ≺ λf (x) + (1 − λ)f (y).
AN US
When f is differentiable, we denote by Jf (x) ∈ Rm×n the Jacobian of the function f
at x. The indicator function of a set S ⊂ Rn is defined as δ(x; S) := 0 iff x ∈ S and δ(x; S) := +∞, if x ∈ / S. The normal cone to a convex set S at x¯ ∈ S is defined by NS (¯ x) := ∂δ(¯ x; S) = {x∗ ∈ Rn |< x∗ , x − x¯ >≤ 0, ∀x ∈ S}.
M
In a multi-criteria setting, there are many optimality definitions. Strong Pareto optimality (SPO) and weak Pareto optimality (WPO) are two of the important concepts,
ED
which can be mathematically defined as follows. Definition 2.1 For a point x∗ ∈ S, x∗ is a weak Pareto optimum (WPO) (resp. strong
PT
Pareto optimum (SPO)) of (1.1) iff @ x ∈ S such that f (x) ≺ f (x∗ ) (resp.f (x) f (x∗ )).
CE
A necessary (but in general not sufficient) condition for Pareto optimality is defined as follows (see Bello Cruz, P´ erez, and Melo, 2011; Qu, Goh, Souza and Wang, 2014; Fliege
AC
and Svaiter, 2000).
Definition 2.2 x∗ ∈ S is a critical point (or stationary point) for f , if R(Jf (x∗ )) ∩ (−int(C)) = ∅,
(2.1)
where R(Jf (x∗ )) denotes the range or image space of the gradient of the continuously differentiable function f at x∗ . 5
ACCEPTED MANUSCRIPT
The above concept is used to develop gradient methods for solving Problem (2.1) and has also been extensively used to develop descent-type algorithms for vector optimization in recent years (Fliege and Svaiter, 2000; Qu, Goh, and Chan, 2011). In the rest of this section, we present an alternative characterization of criticality. Define the positive polar cone of C, C ∗ := {y ∈ Rm : y T x ≥ 0, ∀x ∈ C}. Given
U ⊂ {u ∈ C ∗ :
CR IP T
a point x ∈ S, a positive definite matrix B ∈ Rn×n and a closed and nonempty set k u k= 1}, we denote by dB (x) an optimal solution of the following
problem for x ∈ S:
1 min max uT Jf (x)d + dT Bd, s.t., k d k≤ 1, x + d ∈ S. u∈U d 2
(2.2)
AN US
The feasible set of Problem (2.2) is always nonempty since d = 0 is always a feasible point for any x ∈ S. The constraint direction norm k d k≤ 1 is used to improve performance as it eliminates the possibly unbounded case, i.e., k d k→ ∞. This is also used in Qu, Goh, and Chan (2011). In Problem (2.2), we only assume the closedness of U without convexity. Actually, we can use the convex hull of U in (2.2) without changing our problem, that is,
M
an optimal solution of (2.2) is equivalent to finding an optimal solution of the following problem, d
1 uT Jf (x)d + dT Bd, s.t., k d k≤ 1, x + d ∈ S u∈Conv(U ) 2 max
ED
min
(2.3)
PT
where Conv(·) denotes the convex hull. This can be shown in the following lemma. Lemma 2.3 For any given x ∈ S, the optimal solution set of (2.2) is also the optimal
Proof.
CE
solution set of (2.3).
To prove this, for any given x ∈ S, we first cast (2.2) in the following form,
AC
min{r| (r, d) ∈ Ω(x)}, with Ω(x) := {(r, d) ∈ Rn+1 | x + d ∈ S, uT Jf (x)d + 21 dT Bd ≤
e r, ∀u ∈ U, k d k≤ 1}. Define Ω(x) := {(r, d) ∈ Rn+1 | x + d ∈ S, uT Jf (x)d + 12 dT Bd ≤ e r, ∀u ∈ Conv(U ), k d k≤ 1}, x ∈ S. Now the proof reduces to showing that Ω(x) = Ω(x). e e Clearly, for any x ∈ S, Ω(x) ⊂ Ω(x). Next, we show that Ω(x) ⊂ Ω(x). Because for
e any given x ∈ S, Ω(x) 6= ∅, there is (r, d) ∈ Ω(x). We will show that (r, d) ∈ Ω(x). k P Given any u ∈ Conv(U ), ∃ui ∈ U , i = 1, · · · , k, such that u = λi ui , where λi ≥ 0 i=1
6
ACCEPTED MANUSCRIPT
(i = 1, · · · , k) and
k P
i=1
λi = 1 with some k ≤ m + 1. It follows from the definition of
Ω(x) that (ui )T Jf (x)d + 21 dT Bd ≤ r, i = 1, · · · , k. Therefore, uT Jf (x)d + 21 dT Bd ≤ k P e e λi (ui )T Jf (x)d + 12 dT Bd ≤ r, which implies that (r, d) ∈ Ω(x). Then, Ω(x) ⊂ Ω(x).
i=1
e Therefore, we have Ω(x) = Ω(x), ∀x ∈ S.
Lemma 2.3 implies that we can always assume that U is closed and convex. For the k y k= 1} is closed and convex, and
CR IP T
rest of this paper, we assume that U ⊂ {y ∈ C ∗ :
the cone generated by its convex hull is C ∗ . The following lemma provides an alternative characterization of criticality that will be used in our development of algorithms for (1.1). Lemma 2.4 For any positive definite matrix B ∈ Rn×n , x∗ ∈ S is a critical point of
AN US
Problem (1.1) if and only if dB (x∗ ) = 0, where dB (x∗ ) is an optimal solution of Problem (2.2) with x := x∗ .
Proof. Firstly, we show that if x∗ ∈ S is a critical point of Problem (1.1), dB (x∗ ) = 0,
where dB (x∗ ) is an optimal solution of Problem (2.2) with x = x∗ . We prove it by contradiction, i.e., if dB (x∗ ) 6= 0, we can prove that Jf (x∗ )dB (x∗ ) ∈ −intC. It is easy to
M
see that the optimal value SubB (x∗ ) of (2.2) at x∗ is non positive since d = 0 is always
we have that ∀u ∈ U
ED
feasible to (2.2). As B is positive definite and dB (x∗ ) 6= 0, from the definition of dB (x∗ ),
CE
PT
1 uT Jf (x∗ )dB (x∗ ) < uT Jf (x∗ )dB (x∗ ) + dB (x∗ )T BdB (x∗ ) 2 1 T ∗ ∗ ≤ max u Jf (x )dB (x ) + dB (x∗ )T BdB (x∗ ) u∈U 2 ∗ = SubB (x ) ≤ 0.
This means that Jf (x∗ )dB (x∗ ) ∈ −intC and x∗ is noncritical. It contradicts that x∗ ∈ S
AC
is a critical point.
Next, we show by contradiction that if dB (x∗ ) = 0 is an optimal solution of Problem
(2.2) with x = x∗ , x∗ is a critical point of Problem (1.1). If not, then there exists a
direction d¯ ∈ Rn such that uT Jf (x∗ )d¯ < 0, ∀u ∈ U . From the positivity of B, there exists α ¯ ∈ (0, 1] such that α ¯ uT Jf (x∗ )d¯ + 21 α ¯ 2 d¯T B d¯ < 0, ∀u ∈ U , which means that the optimal value of (2.2) is negative. This contradicts dB (x∗ ) = 0. Hence, the assertion of this lemma is true. 7
ACCEPTED MANUSCRIPT
Note that the objective function max uT Jf (x)d + 12 dT Bd in Problem (2.2) is not necu∈U
essarily a differentiable function. Therefore, it is difficult to directly solve Problem (2.2). Actually, by introducing a variable β, this problem can be reduced to 1 min β + dT Bd, s.t., uT Jf (x)d ≤ β, u ∈ U, k d k≤ 1, x + d ∈ S. β, d 2
Nonmonotone gradient Algorithms
CR IP T
3
(2.4)
In this section, we present two nonmonotone gradient algorithms. We also establish global convergence and local linear rate of convergence for the proposed methods. In the literature (Fukuda and Gra˜ na Drummond (2011, 2013), Gra˜ na Drummond and Iusem (2004),
AN US
Fliege and Svaiter (2000), and Qu, Goh and Souza (2011)), several gradient methods were proposed for solving problem (1.1) or its special case. In particular, Gra˜ na Drummond and Iusem (2004) proposed an extension of the projected gradient method for vector optimization problems. Under some reasonable assumptions, Fukuda and Gra˜ na Drum-
M
mond (2011) established full convergence to optimal points of any sequence produced by the projected gradient method with an Armijo-like rule, no matter how poor the ini-
ED
tial guesses may be. In addition, Fliege and Svaiter (2000) proposed a steepest decent method and Qu, Goh and Souza (2011) developed a Quasi-Newton algorithm for solving multi-objective optimization. We note that the above gradient methods are all based on
PT
monotone line search, i.e., they only accept the trial point as the next iterate if its associated objective function value is strictly lower than that at the current iterate, which can
CE
slow the convergence rate in the minimization process, especially in the presence of narrow curved valleys (Grippo, Lampariello, and Lucidi 1986). Toint (1996) pointed out that the
AC
nonmonotone technique is helpful in overcoming the case where the sequence of iterates is to follow the bottom of curved narrow valleys (a common occurrence in difficult nonlinear problems). Since then, many authors have given numerous nonmonotone line seach-based algorithms for solving scalar optimization problems (see for instance Qu, Goh, and Zhang, 2011, Ji et al. 2010). These papers indicated that the nonmonotone algorithm is efficient, especially for ill-conditioned problems. However, in contrast to nonmonotone line searchbased algorithms for solving scalar optimization problems, the studies on nonmonotone 8
ACCEPTED MANUSCRIPT
line search-based algorithms for solving multi-objective optimization are relatively fewer.
3.1
Algorithm 1
Algorithm 1: ¯ and Step 0. Initialization: Choose parameters η > 1, σ ∈ (0, 1), 0 < b < ¯b, 0 < θ ≤ θ,
CR IP T
¯ and bI 5 Bk 5 ¯bI. Let x0 ∈ S be the initial vector. integer M ≥ 0. Choose θk0 ∈ [θ, θ]
Set k := 0; Step 1. For j = 0, 1, · · ·
1a: Let B := θk Bk with θk := θk0 η j . Solve Problem (2.2) with x := xk and obtain the
optimal solution dB (xk ). If dB (xk ) = 0, then stop and xk is a critical point; Otherwise,
AN US
obtain dk := dB (xk ). 1b: If dk satisfies f (xk + dk )
turn to 1a;
max
[k−M ]+ ≤i≤k
f (xi ) + σJf (xk )dk ,
(3.1)
f (xi ) means componentwise. Otherwise, set j := j + 1 and
M
go to Step 2, where
max
[k−M ]+ ≤i≤k
ED
Step 2: Updated: Let xk+1 := xk + dk . Set k := k + 1 and turn to Step 1. Remark The above method is closely related to the one proposed by Qu, Goh, and Chan (2011). They differ from each other in that the nonmonotone linear search is used
PT
in inequality (3.1), which can overcome the case where the sequence of iterates is to follow the bottom of curved narrow valleys (a common occurrence in difficult nonlin-
CE
ear problems) (see Grippo, Lampariello, and Lucidi (1986), Toint (1996)). In monotone line search methods, f (xk+1 ) f (xk ) is satisfied. In nonmonotone line search methods,
AC
some growth in the function value is permitted. Many researchers have shown that nonmonotone schemes in solving scalar optimization problems can improve the likelihood of finding a global optimum (see Dai (2002), Toint (1996)). Some encouraging numerical results in solving difficult nonlinear scalar optimization problems have also been reported (see Zhang and Hager (2004), Shi and Wang (2011), Lu and Zhang (2012), Ji, Zhang and Zhang (2010)). However, to the best of our knowledge, there is no research about extending nonmonotone schemes to solve vector optimization problems. When M = 0 in 9
ACCEPTED MANUSCRIPT
(3.1), it reduces to monotone line search which has been extensive used in gradient-based algorithms for solving multi-objective optimization. Our method can be viewed as an extension of one steepest gradient method studied by Fliege and Svaiter (2000). We note that local convergence is established for our method, but not studied for the methods given by Qu, Goh, and Chan (2011) and Fliege and Svaiter (2000). We explain the role
CR IP T
of all involved constants in the above algorithm, i.e., M , Bk , b, ¯b and θk . The constant M is a prefixed nonnegative integer and is used as nonmonotone line search in (3.1), which determines that the trial point xk + dk as the next iterate if its associated objective function value is strictly lower than
max
[k−M ]+ ≤i≤k
f (xi ), i.e., the maximal function value from
[k − M ]+ -th iteration to k-th iteration. At k-th iteration, Bk is used to replace B in
AN US
problem (2.2) with x := xk . b and ¯b guarantee Bk bounded below and above respectively. θk updates B in (2.2). With sufficiently large θk , it can be found in the following Lemma
3.1 that there is dB (xk ) obtained by solving (2.2) satisfying nonmontone line search (3.1). The well-posedness of Algorithm 1 is presented in the following lemma.
M
Lemma 3.1 If = 0 and xk ∈ S is noncritical, there exists θe > 0 such that dk :=
e dBk (θk ) (xk ), where Bk (θk ) := θk Bk , satisfies (3.1) whenever θk ≥ θ.
ED
Proof. For simplicity, we let d(θ) := dBk (θ) (xk ) with Bk (θ) := θBk . From the feasibility
PT
of d(θ) to (2.4), we obtain that for all θ > 0, d(θ) 6= 0 and further, θ k d(θ) k≤ −
2uT Jf (xk )d(θ)/ k d(θ) k , ∀u ∈ U (xk ), λmin (Bk )
(3.2)
CE
where U (x) be the solution set of the inner maximization problem of (2.2) for the given x ∈ S. (3.2) implies θ k d(θ) k, ∀θ > 0 is bounded. Therefore, we obtain k d(θ) k→ 0 as
AC
θ → ∞. To prove this Lemma, we need to prove the following conclusion, lim inf θ k d(θ) k> 0. θ→∞
(3.3)
We prove (3.3) by contradiction. If (3.3) is not true, there exists a sequence {θ¯l } with
θ¯l → ∞ such that
θ¯l k d(θ¯l ) k→ 0 as l → ∞,
10
(3.4)
ACCEPTED MANUSCRIPT
where d(θ¯l ) is the optimal solution of (2.4) with x := xk , B := θ¯l Bk and θ := θ¯l . From the definition for U (xk ), the following conclusion holds, 1 1 uT Jf (xk )d + dT Bd ≤ β, ∀u ∈ U (xk ) ⇐⇒ uT Jf (xk )d + dT Bd ≤ β, ∀u ∈ C ∗ . (3.5) 2 2 Therefore, d(θ¯l ) is also the optimal solution of the following problem,
CR IP T
1 min β, s.t., uT Jf (xk )d + dT Bd ≤ β, u ∈ C ∗ , k d k2 ≤ 1, β ≤ −, x + d ∈ S, (3.6) β, d 2 where B = θ¯l Bk . This together with the definition for C ∗ implies that there exists a q P positive integer q, uj ∈ U and λj > 0, j = 1, · · · , q with λj = 1 such that j=1
q X
λj Jf (xk )T uj + θ¯l Bk d(θ¯l ) + 2γd(θ¯l ) + NS (xk + d(θ¯l ))
AN US
0∈
j=1
γ(1− k d(θ¯l ) k2 ) = 0, γ ≥ 0, 1− k d(θ¯l ) k2 ≥ 0
(3.7)
(3.8)
By taking limits on both sides of (3.7) and (3.8) as l → ∞, and using semicontinuity of
of NS (·) (see Theorem 24.4 of Rockafellar, 1970), and the relations θ¯l k d(θ¯l ) k→ 0 and
M
k d(θ¯l ) k→ 0, we see that
q X
λj Jf (xk )T uj + NS (xk ),
ED
0∈
(3.9)
j=1
PT
which means that xk is a stationary point of (1.1). This contradicts the noncriticality of xk . Hence, (3.4) holds. This conclusion together with θd(θ)T Bk d(θ) ≥ λmin (Bk )θ k d(θ) k
k d(θ) k= O(θd(θ)T Bk d(θ)) as θ → ∞.
AC
CE
(λmin (Bk ) > 0 from the generation of Bk ) leads to that (3.10)
From the first inequality of (3.6), 1 1 −uT Jf (xk )d(θ) ≥ θd(θ)T Bk d(θ) ≥ λmin (Bk ) k d(θ) k2 , ∀u ∈ C ∗ . 2 2
(3.11)
This relation together with (3.10) means that as θ → ∞ k d(θ) k= O(θd(θ)T Bk d(θ)) = O(uT Jf (xk )d(θ)), ∀u ∈ C ∗ . 11
(3.12)
ACCEPTED MANUSCRIPT
From this result and the relation k d(θ) k→ 0 as θ → ∞, we obtain uT (f (xk +d(θ))−f (xk )) = uT Jf (xk )d(θ)+o(k d(θ) k) ≤ σuT Jf (xk )d(θ), ∀u ∈ C ∗ (3.13) provided θ is sufficiently large. Using this result, we prove (3.1) by contradiction, i.e., there is u¯ ∈ C ∗ , such that as θ → ∞, f (xi )) > σ¯ uJf (xk )d(θ).
(3.14)
f (xi )) ≤ u¯T (f (xk + d(θ)) − f (xk )),
(3.15)
max
[k−M ]+ ≤i≤k
CR IP T
u¯T (f (xk + d(θ)) − This together with u¯T (f (xk + d(θ)) −
max +
[k−M ] ≤i≤k
AN US
implies that
u¯T (f (xk + d(θ)) − f (xk )) > σ¯ uT Jf (xk )d(θ),
(3.16)
holds for sufficiently large θ. This contradicts with (3.13) and therefore, this assertion is true. Convergence
M
3.1.1
ED
In the following lemma, we prove that the search directions {dk } converges to zero and the sequence of objective values {f (xk )} also converges, i.e., lim dk = 0 and lim f (xk ) = f ∗ k→∞
k→∞
for some f ∗ ∈ Rm . From Lemma 2.4, the convergence of {dk } to zero implies that the
PT
corresponding limit point {xk } if it exists is a critical point of (1.1). As Algorithm 1 is
a descent-type method, i.e., at each iteration a feasible solution is accepted only when it
CE
dominates the current point, the convergence of {f (xk )} implies that at the limit point there is no feasible solutions can be found to dominate the current point. For the proof
AC
of Lemma 3.2, we suppose that the sequence {f (xk )} is C−bounded from below, i.e., ∃f¯ ∈ Rm , s.t., f¯ f (xk ), ∀k.
The concept of C−boundedness is a generalization of the boundedness for scalar value functions and has been extensively used in the proof of the convergence for gradient-based methods for solving vector optimization (see Fukuda and Gra˜ na Drummond (2011, 2013), Qu, Goh, and Chan (2011)). 12
ACCEPTED MANUSCRIPT
Lemma 3.2 Suppose that the sequences {dk } and {xk } are generated by Algorithm 1,
and the sequence {f (xk )} is C−bounded from below. Then, as k → ∞, dk → 0, and
f (xk ) → f ∗ with some given f ∗ ∈ Rm .
Proof. For any k, define xl(k) := argmax{f (xi ) : [k − M ]+ ≤ i ≤ k} where l(k) is an
integer in [[k − M ]+ , k]. From the generation of {xk }, it is obvious f (xk+1 ) f (xl(k) ), ∀k.
CR IP T
This together with the definition of l(k) means that f (xl(k+1) ) f (xl(k) ), ∀k. Further,
from the assumption of this lemma that {f (xk )} is C−bounded from below, hence, for some given f ∗ ∈ Rm , the following limit holds,
lim f (xl(k) ) = f ∗ .
(3.17)
AN US
k→∞
We next show that the following limits hold for all j ≥ 1, lim dl(k−j) = 0,
k→∞
lim f (xl(k−j) ) = f ∗ .
k→∞
(3.18)
We prove it by induction. We replace k with l(k) − 1 in (3.1),
M
f (xl(k) ) f (xl(l(k)−1) ) + σJf (xl(k)−1 )dl(k)−1 ,
(3.19)
ED
Using (3.12) with k and θ replaced by l(k) − 1 and θl(k)−1 , respectively, we have
PT
uT Jf (xl(k)−1 )dl(k)−1 ≤ −bθl(k)−1 k dl(k)−1 k2 , ∀u ∈ C ∗
(3.20)
here the inequality follows from Bl(k)−1 = bI. Combining (3.19) and (3.20), we have
CE
uT f (xl(k) ) − f (xl(l(k)−1) ) ≤ −σbθl(k)−1 k dl(k)−1 k2 , ∀u ∈ C ∗ .
(3.21)
Letting k → ∞ in the above relation and noticing (3.17), we have lim θl(k)−1 k dl(k)−1 k2 =
AC
k→∞
0. This together with θk ≥ θ, for all k, implies that lim dl(k)−1 = 0. Then using the k→∞
assumption about {f (xk )} and (3.17) again, we have
lim f (xl(k)−1 ) = lim f (xl(k) − dl(k)−1 ) = lim f (xl(k) ) = f ∗ .
k→∞
k→∞
k→∞
(3.22)
Therefore, (3.18) holds for j = 1. Similar with the proof of (3.18) for j = 1, it is easy to prove that if (3.18) is true for j, then it also holds for j + 1. 13
ACCEPTED MANUSCRIPT
Finally, we show that this lemma is true. According to the definition of l(k), it is obvious that k ≥ M + 1, k − M − 1 = l(k) − j for some 1 ≤ j ≤ M + 1, which together with the first limit in (3.18) leads to lim dk = lim dk−M −1 = 0. Further, we note that k→∞
k→∞
¯
x
l(k)
=x
k−M −1
+
lk X j=1
dl(k)−j , ∀k ≥ M + 1,
CR IP T
where ¯lk := l(k) − (k − M − 1) ≤ M + 1. This result together with (3.18) and the assumption about {f (xk )}, we have that lim f (xk ) = lim f (xk−M −1 ) = f ∗ . Therefore, k→∞
k→∞
the assertion of this lemma is true. We next show that Algorithm 1 is globally convergent.
AN US
Theorem 3.3 Suppose that the sequence {xk } are generated by Algorithm 1, and the
sequence {f (xk )} is C−bounded from below,. Then, any accumulation point of {xk } is a critical point of (1.1).
Proof. We prove this theorem by contradiction, i.e., there exists an accumulation point
M
x∗ of {xk } that is a noncritical point of (1.1). Suppose that there is a subsequence
{xk }k∈K → x∗ . We first prove that {θk }k∈K is bounded. If it is not true, then there is
ED
a subsequence of {θk }k∈K that goes to ∞. Without loss of generality, we assume that {θk }k∈K → ∞. This together with the choice of θk0 implies that there exists some index ¯ For simplicity, we define θ¯k := k¯ ≥ 0 such that θk > θk0 for all k ∈ K with k ≥ k.
θk η
and
PT
dk (θ) := dBk (θ) (xk ) for k ∈ K and θ > 0, where Bk (θ) = θBk . From the choice of θk in
CE
steps (1a) and (1b), there exists some u¯ ∈ U such that u¯T f (xk + dk (θ¯k )) >
max
[k−M ]+ ≤i≤k
u¯T f (xi ) + σ¯ uT Jf (xk )dk (θ¯k ), ∀k ∈ K.
(3.23)
AC
Similar with the argument as (3.2), we have 2uT Jf (xk )dk (θ¯k )/ k d(dk (θ¯k )) k θ¯k k dk (θ¯k ) k≤ − , ∀k ∈ K, u ∈ U (xk ). λmin (Bk )
(3.24)
This along with Bk = bI and {xk }k∈K → x∗ , means that {θ¯k k dk (θ¯k ) k}k∈K is bounded.
Then, {k dk (θ¯k ) k}k∈K → 0 as {{θ¯k }k∈K → ∞. Further, similar with the proof of (3.3), we have lim inf θ¯k k dk (θ¯k ) k> 0,
k∈K, k→∞
14
(3.25)
ACCEPTED MANUSCRIPT
which together with a similar argument as for proving (3.10) leads to k dk (θ¯k ) k= O(θ¯k dk (θ¯k )T Bk dk (θ¯k )) as θ → ∞, k ∈ K → ∞.
(3.26)
This result along with a similar proof of (3.1) in Lemma 3.1, we have that max
[k−M ]+ ≤i≤k
f (xi ) + σJf (xk )dk (θ¯k ),
(3.27)
CR IP T
f (xk + dk (θ¯k ))
holds for sufficiently large k ∈ K. This implies that for all u ∈ U , uT f (xk + dk (θ¯k )) ≤
max +
[k−M ] ≤i≤k
uT f (xi ) + σuT Jf (xk )dk (θ¯k ),
holds for sufficiently large k ∈ K, which contradicts (3.23).
Therefore, {θk }k∈K is
AN US
bounded.
(3.28)
Finally, since dk = dk (θk ) is an optimal solution to (2.4) with x = xk and B = θk Bk , q P there exists a positive integer q, uj ∈ U and λj > 0, j = 1, · · · , q with λj = 1 such
that
λj Jf (xk )T uj + θk Bk dk + 2γdk + NS (xk + dk )
M
0∈
q X j=1
(3.29) (3.30)
ED
γ(1− k dk k2 ) = 0, γ ≥ 0, 1− k dk k2 ≥ 0
j=1
By taking limits on both sides of (3.29) and (3.30) as k ∈ K → ∞, and using semiconti-
PT
nuity of of NS (·) (see Theorem 24.4 of Rockafellar, 1970), the boundedness of θk , and the
CE
relation dk → 0 (Lemma 3.2), we see that 0∈
q X
λj Jf (x∗ )T uj + NS (x∗ ),
(3.31)
j=1
AC
which means that x∗ is a stationary point of (1.1). This result contradicts the noncritical assumption about x∗ in the beginning of this proof. This contradiction implies that the
conclusion of this theorem is true. In the rest of this part, we will analyze the locally linear convergence rate of Algorithm
1 under the following assumption, which is a generation of the one for scalar optimization made by Tseng and Yun (2009). In the rest of this paper, we denote S¯ as the set of critical points of problem (1.1). 15
ACCEPTED MANUSCRIPT
Assumption 1 (a) S¯ = 6 ∅ and, for any minx∈S f (x) ξ, there exist > 0 and ω > 0 such that ¯ ≤ ω k dI (x) k whenever f (x) ξ, k dI (x) k≤ , dist(x, S) where ξ ∈ Rm .
CR IP T
(b) There exists δ > 0 such that ¯ y ∈ S, ¯ f (x) 6= f (y). k x − y k≥ δ whenever x ∈ S,
Assumption 1 is a generation of Assumptions A and B given by Luo and Tseng (1993) for scalar optimization. Assumption 1(a) is a local Lipschitzian error bound assumption, i.e., the distance from x to S¯ is locally in the order of the norm of the residual at x.
AN US
This kind of error bounds has been extensively studied in literature of scalar optimization. When the function f is polyhedral (see Robinson, 1981), from Corollary given by Robinson (1981) Assumption 1(a) holds. Assumption 1(b) means that iso-cost surfaces of f attached to S¯ are “properly separated” (see Tseng and Yun, 2009). Assumption 1(b) holds whenever f has only a finite number of vector values on S¯ or whenever the
M
connected components of S¯ are properly separated from each other. Assumption 1 is used
ED
to prove the locally linear convergence rate of Algorithm 1. The following theorem shows that Algorithm 1 is asymptotic convergence. The proof of this theorem is inspired by the proof of a similar local convergence for the solution method of scalar optimization by
PT
Tseng and Yun (2009).
CE
Theorem 3.4 Suppose that {xk } is generated by Algorithm 1, Assumption 1 holds, and
the sequence {f (xk )} is C−bounded from below. Then, there exists some constant c ∈
AC
(0, 1) such that
f (xl(k) ) − f ∗ c(f (xl(l(k)−1) ) − f ∗ ),
(3.32)
where l(k) is defined as in Lemma 3.2, and f ∗ = lim f (xk ). k→∞
Proof. From the proof of Theorem 3.3, {θk } is bounded. This together with Lemma
3.1 and the specific choice of θk implies that θˆ := sup θk < ∞. Then, according to k
bI 5 Bk 5 ¯bI and θk ≥ θ, we have θbI 5 Bk (θk ) 5 θˆ¯bI, where Bk (θk ) := θk Bk . Using this 16
ACCEPTED MANUSCRIPT
relation, Lemma 3.2 of Tseng and Yun (2009), bI 5 Bk , and dk = dBk (θk ) (xk ), we derive that k dI (xk ) k= O(k dk k),
(3.33)
which along with Lemma 3.2 means {dI (xk )} → 0. Hence, for any > 0, there exists some ¯ This result together with Assumption index k¯ such that k dI (xl(k)−1 ) k≤ for all k ≥ k.
CR IP T
0 ¯ and some 1(a) and (3.33) implies that there exists some index k , some x¯l(k)−1 ∈ S,
constant c1 > 0 such that
0
k xl(k)−1 − x¯l(k)−1 k≤ c1 k dl(k)−1 k, ∀k ≥ k . Since {dk } → 0 and
[k−1]+
l(k+1)−2
−x
l(k)−1
k≤
X
i
k d k≤
X
k di k,
AN US
kx
l(k+1)−1
i=l(k)−1
(3.34)
i=[k−M −1]+
(3.35)
we have that k xl(k+1)−1 − xl(k)−1 k→ 0. Then, from this result, (3.34), and Lemma 3.2, the following inequalities hold,
M
k x¯l(k+1)−1 − x¯l(k)−1 k≤k x¯l(k+1)−1 − xl(k+1)−1 k + k xl(k)−1 − x¯l(k)−1 k + k xl(k+1)−1 − x¯l(k)−1 k (3.36)
ED
≤ c1 k dl(k+1)−1 k +c1 k dl(k)−1 k + k xl(k+1)−1 − x¯l(k)−1 k→ 0.
0 According to (3.36) and Assumption 1(b), there exists an index kˆ ≥ k and f¯ ∈ Rm such
PT
that
ˆ f (¯ xl(k)−1 ) = f¯, ∀k ≥ k.
(3.37)
CE
Then, similar with the proof of Lemma 5.1 of Tseng and Yun (2009), we can prove that f ∗ = lim f (xk ) = lim inf f (xl(k)−1 ) f¯. k→∞
k→∞
(3.38)
AC
Further, for any u ∈ C ∗ , there exists L and x¯k lying on the segment joining xl(k) with
x¯l(k)−1 such that for k ≥ kˆ
uT (f (xl(k) − f¯) = uT (f (xl(k) − f (xl(k)−1 ) = uT Jf (¯ xk )(xl(k) − x¯l(k)−1 ) = uT Jf (¯ xk ) − J(xl(k)−1 ) (xl(k) − x¯l(k)−1 ) − (Bl(k)−1 (θl(k)−1 )dl(k)−1 )T (xl(k) − x¯l(k)−1 ) h i l(k)−1 l(k)−1 T l(k) l(k)−1 + J(x )u + Bl(k)−1 (θl(k)−1 )d (x − x¯ )
≤ L k x¯k − xl(k)−1 kk xl(k) − x¯l(k)−1 k +θˆ¯b k dl(k)−1 kk xl(k) − x¯l(k)−1 k, 17
(3.39)
ACCEPTED MANUSCRIPT
where the first term in the inequality of (3.39) comes from the assumption about f , the second one comes from the non-positivity of the third term of the third equality and ˆ Bk (θk ) 5 θˆ¯bI. It comes from the choice of x¯k and (3.34) that, for k ≥ k, k x¯k − xl(k)−1 k≤k xl(k) − x¯l(k)−1 k
This result together with (3.39), Bk (θk ) ≥ θbI, and
(3.40)
CR IP T
≤k xl(k) − xl(k)−1 k + k xl(k)−1 − x¯l(k)−1 k≤ (1 + c1 ) k dl(k)−1 k .
−uT Jf (xl(k)−1 )dl(k)−1 ≥ (dl(k)−1 )T Bdl(k)−1 ≥ λmin (B) k dl(k)−1 k2 , ∀u ∈ C ∗ , implies that for k ≥ kˆ and u ∈ C ∗ , there exists some constant c2 > 0 such that,
AN US
uT (f (xl(k) ) − v) ≤ −c2 uT Jf (xl(k)−1 )dl(k)−1 .
(3.41)
ˆ From (3.19) and (3.41), we have that for c3 := c2 /σ and k ≥ k,
uT (f (xl(k) ) − v) ≤ c3 uT f (xl(l(k)−1) − f (xl(k) )) , ∀u ∈ C ∗ .
(3.42)
M
Taking limits on both sides of (3.42) and using lim f (xl(k) ) = f ∗ , we have that uT f ∗ ≤ k→∞
uT v, ∀u ∈ C ∗ , which along with (3.38) means that v = f ∗ . According to this result and
ED
upon rearranging terms of (3.42), we get the following inequality, (3.43)
PT
ˆ ∀u ∈ C ∗ uT (f (xl(k) ) − f ∗ ) ≤ cuT (f (xl(l(k)−1) ) − f ∗ ), ∀k ≥ k,
where c := c3 /(1 + c3 ) ∈ (0, 1). This result obviously leads to the conclusion of this
Algorithm 2
AC
3.2
CE
theorem.
We next propose the second nonmonotone gradient method for (1.1) as follows. Algorithm 2: Step 0. Initialization: Choose parameters η ∈ (0, 1), σ ∈ (0, 1), 0 < α < α ¯ , 0 < b ≤ ¯b,
and integer M ≥ 0. Let x0 ∈ S be the initial vector. Set k := 0;
Step 1. Choose bI 5 Bk 5 ¯bI. Solve Problem (2.2) with x := xk . If dBk (xk ) = 0, then
stop and xk is a critical point; Otherwise, obtain dk := dBk (xk ). 18
ACCEPTED MANUSCRIPT
Step 2. Choose αk0 ∈ [α, α ¯ ]. Find the smallest integer j ≥ 0 such that αk := αk0 η j satisfies f (xk + αk dk )
max
[k−M ]+ ≤i≤k
f (xi ) + σαk Jf (xk )dk , xk + αk dk ∈ S.
(3.44)
Step 3: Updated: Let xk+1 := xk + αk dk . Set k := k + 1 and turn to Step 1. Remark Algorithm 2 is different from Algorithm 1 in two aspects. Bk can be arbitrar-
CR IP T
ily chosen satisfying certain constraints, i.e., bI 5 Bk 5 ¯bI, while the one in Algorithm 1 needs to update. The other difference is that the step-size αk satisfying (3.44) is used in the iteration of Algorithm 2, while the step-size in Algorithm 1 is always supposed to be 1. Note that Algorithm 2 is closely related to the one presented by Tseng and Yun (2009). When f reduces to a scalar-value function and M = 0, Algorithm 2 reduces to a gradient
AN US
descent method. As Algorithm 2 is generally a nonmonotone method when M ≥ 1 and aims to solve a vector optimization problem, most proofs of global and local convergence for the method given by Tseng and Yun (2009) do not hold for Algorithm 2. In addition, Algorithm 2 can be viewed as an extension of one projected gradient method proposed by Bello Cruz, Lucambio P´ erez, and Melo (2011). When M = 0, the term
max
[k−M ]+ ≤i≤k
f (xi )
M
reduces to f (xk ) and Algorithm 2 reduces to a monotone method. Just as Algorithm 1, in this method we also use the nonmonotone method, i.e., M ≥ 1. Because that the
ED
nonmonotone technique is helpful in overcoming the case where the sequence of iterates is to follow the bottom of curved narrow valleys (a common occurrence in difficult nonlinear
PT
problems) (see Toint, 1996).
CE
The following result shows the well-definedness of Algorithm 2. Lemma 3.5 If xk in noncritical to (1.1), there exists α e > 0 such that dk := dBk (xk )
AC
satisfies (3.44) whenever αk ∈ (0, α e].
Proof. Similar with the proof of Lemma 2.1 of Tseng and Yun (2009), we can prove that f (xk + αk dk ) f (xk ) + αJf (xk )dk + o(α)
max
[k−M ]+ ≤i≤k
f (xi ) + αJf (xk )dk + o(α), ∀α ∈ (0, 1].
(3.45) (3.46)
This result together with Jf (xk )dk ≺ 0 (since xk is noncritical) implies that the assertion holds. 19
ACCEPTED MANUSCRIPT
3.2.1
Convergence
To prove the global and local convergent results, we need the following technical results. We prove that {αk dk } converges to zero and the sequence of objective values {f (xk )} also converges, i.e., lim dk = 0 and lim f (xk ) = f ∗ for some f ∗ ∈ Rm . k→∞
k→∞
Lemma 3.6 Suppose that the sequences {dk } and {xk } are generated by Algorithm 2 and lim αk dk = 0 and lim f (xk ) = f ∗ for some given f ∗ ∈ Rm . k→
k→∞
CR IP T
the sequence {f (xk )} is C−bounded from below. Then, the following two limits hold, Proof. From the definition of dk and Bk = bI, for any u ∈ C ∗ , we have
AN US
1 1 uT Jf (xk )dk ≤ − (dk )T Bk dk ≤ − b k dk k . 2 2
(3.47)
This result along with the relation αk ≤ αk0 ≤ α ¯ , means that αk2 k dk k2 ≤ −2¯ α αk
uT Jf (xk )dk , ∀u ∈ C ∗ . b
(3.48)
Define l(k) as in the proof of Lemma 3.2. Then, similar with the proof of (3.17), we can
for all j ≥ 1,
M
prove that {xk } satisfies (3.17) for some f ∗ . We next show the following two limits hold lim αl(k)−j dl(k)−j = 0, lim f (xl(k)−j ) = f ∗ . k→∞
ED
k→∞
(3.49)
We prove it by induction. We first replace k with l(k) − 1 in (3.44) and have that
PT
f (xl(k) ) f (xl(l(k)−1) ) + σαl(k)−1 Jf (xl(k)−1 )dl(k)−1 ,
(3.50)
which together with (3.17) and (3.48) implies that lim αl(k)−1 Jf (xl(k)−1 )dl(k)−1 = 0. From k→∞
CE
this result and (3.48) again, we have that the first limit of (3.49) holds for j = 1. From this result, (3.17) and the assumption about f , it is easy to show that the second limit
AC
of (3.49) holds for j = 1. We next show that if (3.49) holds for j, then it also holds for j + 1. From (3.44), we have that f (xl(k)−j ) f (xl(l(k)−j−1) ) + σαl(k)−j−1 Jf (xl(k)−j−1 )dl(k)−j−1 ,
(3.51)
which along with (3.17) and the induction assumption that lim f (xl(k)−j ) = f ∗ , implies k→∞
lim αl(k)−j−1 Jf (x
k→∞
l(k)−j−1
)d
l(k)−j−1
= 0. From this result and (3.48), we get
lim αl(k)−j−1 dl(k)−j−1 = 0,
k→∞
20
ACCEPTED MANUSCRIPT
which together with the induction assumption lim f (xl(k)−j ) = f ∗ and the assumption k→∞
about {f (xk )}, yields that lim f (xl(k)−j−1 ) = f ∗ . Therefore, (3.49) holds for j + 1. The k→∞
assertion of this lemma then follows from (3.49) and a similar proof as that in Lemma 3.2. We next prove that Algorithm 2 is globally convergent.
CR IP T
Theorem 3.7 Suppose that the sequence {xk } is generated by Algorithm 2 and the se-
quence {f (xk )} is C−bounded from below. Then, any accumulation point of the sequence {xk } is a critical point of (1.1).
Proof. We prove this theorem by contradiction, i.e., suppose that there is an accumulation
AN US
point x∗ of {xk } which is a noncritical point of (1.1). We denote K as the subsequence such
that {xk }k∈K → x∗ . We first show that the following inequality holds, lim inf k dk k> 0. k∈K,k→∞
If it is not true, without loss of generality, we can assume that {dk }k∈K → 0. Since dk is
an optimal solution to (2.4) with x = xk and B = Bk , there exists a positive integer q, q P λj = 1 such that uj ∈ U and λj > 0, j = 1, · · · , q with λj Jf (xk )T uj + Bk dk + 2γdk + NS (xk + dk )
(3.52)
j=1
ED
0∈
q X
M
j=1
γ(1− k dk ) k2 ) = 0, γ ≥ 0, 1− k dk k2 ≥ 0
(3.53)
PT
By taking limits on both sides of (3.52) and (3.53) as k ∈ K → ∞, and using semicontinu-
ity of of NS (·) (see Theorem 24.4 of Rockafellar, 1970), and the relation dk → 0 (Lemma
AC
CE
3.2), we see that
0∈
q X
λj Jf (x∗ )T uj + NS (x∗ ),
(3.54)
j=1
which means that x∗ is a stationary point of (1.1). This result contradicts the noncritical
assumption about x∗ in the beginning of this proof. This contradiction implies that lim inf k dk k> 0 holds. Similar with the proof of (3.2) in Lemma 3.1, we obtain,
k∈K,k→∞
k dk k≤ −
2uT Jf (xk )dk / k dk k , ∀k ∈ K, ∀u ∈ C ∗ . λmin (Bk ) 21
(3.55)
ACCEPTED MANUSCRIPT
From (3.55), {xk }k∈K → x∗ , Bk = bI and lim inf k dk k> 0, we have the conclusion k∈K,k→∞
that {dk }k∈K is bounded. Further, from (3.48), we have that lim sup uT Jf (xk )dk < 0, k∈K, k→∞
∀u ∈ C ∗ . This result together with Lemma 3.6 implies that {αk }k∈K → 0. Then, from
the specifical choice of αk , there exists some index k¯ such that αk < αk0 and αk < η for all ¯ Define α k ∈ K with k ≥ k. ¯ k :=
αk . η
Then, {¯ αk }k∈K → 0 and 0 < α ¯ k ≤ 1 for all k ∈ K.
exists u¯ ∈ C ∗ such that
u¯T f (xk + α ¯ k dk ) >
max
[k−M ]+ ≤i≤k
CR IP T
¯ there From the choice of αk in Algorithm 2, we obtain that, for all k ∈ K with k ≥ k,
u¯T f (xi ) + σ¯ uT α ¯ k Jf (xk )dk .
(3.56)
On the other hand, by the assumption of f , lim sup uT Jf (xk )dk < 0, ∀u ∈ C ∗ , and the k∈K, k→∞
boundedness of {d }k∈K , we obtain that, for sufficiently large k ∈ K and ∀u ∈ C ∗ ,
AN US
k
uT f (xk + α ¯ k dk ) = uT f (xk ) + α ¯ k uT Jf (xk )dk + o(¯ αk k dk k) ≤
max
[k−M ]+ ≤i≤k
u¯T f (xi ) + α ¯ k uT Jf (xk )dk + o(¯ αk k dk k) (3.57)
M
< +¯ αk uT Jf (xk )dk + o(¯ αk k dk k) + σ α ¯ k uT Jf (xk )dk .
ED
(3.57) clearly contradicts (3.56). The contradiction leads to the conclusion of this theorem.
We next show the local linear rate of convergence of Algorithm 2.
PT
Theorem 3.8 Suppose that {xk } is generated by Algorithm 2, Assumption 1 holds, and
the sequence {f (xk )} is C−bounded from below. Then, there exists some constant c ∈
CE
(0, 1) such that
f (xl(k) ) − f ∗ c(f (xl(l(k)−1) ) − f ∗ ),
(3.58)
AC
where l(k) is defined as in Lemma 3.2, and f ∗ = lim f (xk ). k→∞
Proof. From the specific choice of αk and Lemma 3.6 of Lu and Zhang (2012), we see that inf αk > 0. This result along with Lemma 3.6 means that {dk } → 0. Similar with k
the proof of (3.33), the following equality holds, k dI (xk ) k= O(k dk k). 22
(3.59)
ACCEPTED MANUSCRIPT
Hence, {dkI } → 0 and by a similar argument as that in the proof of Theorem 3.4, there ˆ v ∈ Rm , and x¯l(k)−1 ∈ S¯ such that are c1 > 0, some index k,
ˆ k xl(k)−1 − x¯l(k)−1 k≤ c1 k dl(k)−1 k, f (¯ xl(k)−1 ) = v, ∀k ≥ k.
(3.60)
Then, similar with the proof of (3.38), we can prove that (3.38) holds for {xk }, and f ∗
CR IP T
and v. According to the assumption about f , and bI 5 Bk 5 bI, we have, for any u ∈ C ∗ ,
there exists L and x¯k lying on the segment joining xl(k) with x¯l(k)−1 such that for k ≥ kˆ
AN US
uT (f (xl(k) − f¯) = uT (f (xl(k) − f (xl(k)−1 ) = uT Jf (¯ xk )(xl(k) − x¯l(k)−1 ) xk ) − J(xl(k)−1 ) (xl(k) − x¯l(k)−1 ) − (Bl(k)−1 dl(k)−1 )T (xl(k) − x¯l(k)−1 ) = uT Jf (¯ h i T + J(xl(k)−1 )u + Bl(k)−1 dl(k)−1 (xl(k) − x¯l(k)−1 ) ≤ L k x¯k − xl(k)−1 kk xl(k) − x¯l(k)−1 k +¯b k dl(k)−1 kk xl(k) − x¯l(k)−1 k +(αl(k)−1 − 1) (dl(k)−1 )T Bl(k)−1 dl(k)−1 + uT Jf (xl(k)−1 )dl(k)−1 .
(3.61)
ˆ From the choice of x¯k , (3.34) and αk ≤ 1, we have, for k ≥ k,
M
k x¯k − xl(k)−1 k≤k xl(k) − xl(k)−1 k + k xl(k)−1 − l(k)¯− 1 k≤ (1 + c1 ) k dl(k)−1 k . (3.62)
ED
ˆ k xl(k) − x¯l(k)−1 k≤ (1 + c1 ) k dl(k)−1 k. Then, it follows from these Similarly, for k ≥ k, ˆ for any u ∈ C ∗ , inequalities, Bk = bI, αk ≤ 1, and (3.61), we have that, for k ≥ k,
PT
uT (f (xl(k) ) − f¯) ≤ −c2 uT Jf (xl(k)−1 )dl(k)−1 ,
CE
for some constant c2 > 0. The remaining proof follows similarly as that of Theorem 3.4.
In the rest of this section, we discuss the convergence of Algorithm 2 for solving a
AC
m special class of vector optimization with C := R+ without Assumption 1.
Theorem 3.9 Suppose that the following conditions hold, m 1. C := R+ ;
2. Bk := I, ∀k in Algorithm 2; 3. M := 0 in Algorithm 2. 23
ACCEPTED MANUSCRIPT
Then we have that, for all x ∈ S and the sequence {xk } generated by Algorithm 2, there m P exists {ukj }m ⊂ [0, 1] satisfying ukj = 1, and j=1 j=1
kx
k+1
2
k
2
− x k − k x − x k ≤ 2αk
m X j=1
ukj ∇fj (xk )T (x
m 2X (fj (xk ) − fj (xk+1 )). −x )+ σ j=1 k
CR IP T
m Proof. When C = R+ and Bk = I, the subproblem at k−th iteration in Algorithm 2
reduces to min d
max
j∈{1,··· ,m}
1 k d k2 , s.t., k d k≤ 1, xk + d ∈ S. 2
∇fj (xk )T d +
(3.63)
Because the objective function of the above problem is strongly convex, the optimal
AN US
solution dk satisfies the following inequality,
(d − dk )T v k ≥ 0, ∀ k d k≤ 1, xk + d ∈ S, where v k ∈ ∂φk (dk ) and φk (d) :=
max
j∈{1,··· ,m}
1 2
∇fj (xk )T d +
k
M
definition of φk , there exist ukj ≥ 0, j = 1, · · · , m, such that k
v =d +
m X
ED
j=1
(3.64)
k d k2 . According to the
m P
j=1
ukj = 1 and
ukj ∇fj (xk ).
(3.65)
By simple computation, ∀x ∈ S, define
PT
γk :=k xk − x k2 − k xk+1 − x k2 + k xk+1 − xk k2 = 2(xk+1 − xk )T (x − xk ).
CE
This together with xk+1 = xk + αk dk implies that γk = 2αk (x − xk )T dk = 2αk (x − xk − dk )T dk + 2αk k dk k2 .
(3.66)
AC
From (3.65) and (3.66), k
2
k
k T
k
γk − 2αk k d k = 2αk (x − x − d ) (v − = 2αk ((x − xk − dk )T v k − (x − xk − dk )T ≥ 2αk
(xk − x)T
m X j=1
ukj ∇fj (xk ) + (dk )T 24
m X
j=1 m X j=1
m X j=1
ukj ∇fj (xk ))
ukj ∇fj (xk )) !
ukj ∇fj (xk ) ,
(3.67)
ACCEPTED MANUSCRIPT
where the inequality comes from (3.64). The above inequality implies that k xk − x k2 − k xk+1 − x k2 +αk (αk − 2) k dk k2 = γk − 2αk k dk k2 ! m m X X ≥ 2αk (xk − x)T ukj ∇fj (xk ) + (dk )T ukj ∇fj (xk ) . j=1
(3.68)
j=1
CR IP T
(3.68) together with (3.44) and M = 0 means that m m X 2X k k k k+1 2 k 2 k T uj ∇fj (x ) + uj (fj (xk ) − fj (xk+1 )). kx − x k − k x − x k ≤ 2αk (x − x ) σ j=1 j=1
(3.69)
It follows from f (xk+1 ) f (xk ) and ukj ≥ 0, ∀j, k, that m X j=1
ukj (fj (xk ) − fj (xk+1 )) ≤
m X j=1
(fj (xk ) − fj (xk+1 )).
(3.70)
4
AN US
Therefore, it follows from (3.69) and (3.70) that the assertion holds.
Multiobjective Portfolio Optimization
portfolio optimization problems.
Models
ED
4.1
M
In this section, we discuss an application of the proposed algorithms to mutli-criteria
Suppose that a capital market has a set of risky assets each with uncertain returns.
CE
j=1
PT
Let xj ∈ R+ , j = 1, · · · , n denote the amount invested in the different asset, where n P xj = 1 and xj ≥ 0, j = 1, · · · , n means that short selling is not permitted. The
stochastic returns of the investments x := (x1 , · · · , xn )T ∈ Rn in the assets are given
AC
by r(ω) := (r1 (ω), · · · , rn (ω))T : Ω → Rn , where Ω is the known support. Then, the n P e portfolio’s return is the random variable: R(x) := xj rj (ω), ω ∈ Ω. The mean-variance j=1
model can then be presented for selecting a portfolio by maximizing the expected return as e well as minimizing the variance of R(x) subject to some constraints, that is, the following bicriteria optimization problem,
e e min f (x) := (−E[R(x)], σ 2 (R(x))) n P s.t. xj = 1, xj ≥ 0, j = 1, · · · , n x∈Rn
j=1
25
(4.1)
ACCEPTED MANUSCRIPT
2 e e e e where E[R(x)] and σ 2 (R(x)) := E[(R(x) − E[R(x)]) ] are the expectation and variance
e e of R(x) respectively. If we know the portfolio return R(x) := x1 R1 + · · · + xm Rm from m m P P e choice x := (x1 , · · · , xm ), then σ 2 (R(x)) = xi xj σij , where σij is the covariance of i=1 j=1
Ri and Rj , and thus variance is expressed as a quadratic function of x1 , · · · , xm . Since
e R is a linear function about x, so the variance σ 2 (R(x)) is convex about x, making the
4.2
Numerical Tests
CR IP T
above bicriteria optimization problem convex.
In our numerical tests, we use the following procedure to solve Problem (2.2). Given a
T
max∗ u f (x), s.t., u∈C
m X i=1
AN US
point x ∈ S, solve the following linear optimization problem,
ui = 1, ui ≥ 0, i = 1, · · · , m.
(4.2)
Let U (x) be the solution set of the above problem for the given x ∈ S. Given a positive definite matrix B ∈ Rn×n and a sufficient small > 0, we denote by dB (x) the solution of
M
the following problem if there is an optimal solution:
ED
1 min β + dT Bd, s.t., uT Jf (x)d ≤ β, u ∈ U (x), k d k≤ 1, β ≤ −, x + d ∈ S. (4.3) β, d 2 Problem (4.3) is denoted as SubB (x, ). The constraint β ≤ − is used as a termination rule, i.e., for given x ∈ S, when there is no direction d such that β ≤ −, we have x as
PT
an approximate solution to the critical point. Different from Lemma 2.4, the following
CE
lemma provides another alternative characterization of criticality. Lemma 4.1 For any positive definite matrix B ∈ Rn×n , x ∈ S is a critical point of
AC
Problem (1.1) if and only if there is no solution to (4.3) for any given > 0. Proof. From Definition 2.2, if x ∈ S is not a critical point, then there is a direction d ∈ Rn
such that Jf (x)d ∈ (−int(C)), i.e., Jf (x)d ≺ 0. This implies that if x ∈ S is a critical point of problem (1.1), then for any d ∈ Rn and u ∈ C ∗ , uT Jf (x)d ≥ 0. Therefore, the
feasible solution set of Problem (4.3) is empty and there is no solution to (4.3). Similarly, if there is no solution to Problem (2.4) for any given > 0, we have that x is a critical point of Problem (1.1). 26
ACCEPTED MANUSCRIPT
For the simplicity, we denote by SubB (x, ) for given ≥ 0 both the problem and its optimal value (with the convention that SubB (x, ) = +∞ if the problem is infeasible). If = 0 in Problem (4.3), then its feasible solution set is always nonempty since d = 0 is always a feasible solution. When > 0, (4.3) maybe infeasible and from Lemma 4.1 x ∈ S is an approximate critical point of Problem (1.1) when it is infeasible. Further, we
CR IP T
have the following conclusions. Lemma 4.2 For any given positive definite matrix B, the following conclusions are true: (i) for any given noncritical point x, the feasible set of Problem (4.3) is nonempty and closed for any > 0;
(ii) for any given noncritical point x, SubB (x, ) converges to SubB (x, 0) as → 0;
AN US
(iii) for any given > 0, SubB (·, ) is noncontinuous with respect to the critical point x. Proof. (i) The conclusion follows from Lemma 4.1 directly.
(ii) Because the feasible region of SubB (x, ) is a subset of the feasible region of SubB (x, 0), hence, SubB (x, ) ≥ SubB (x, 0) for all > 0. This implies that we only
M
have to show that
ED
∀δ > 0, ∃e > 0, such that SubB (x, ) ≤ SubB (x, 0) + δ, ∀ ∈ (0, e ].
It is obvious that this is true from the construction of the problem. Therefore, the
PT
conclusion is true.
(iii) From Lemma 4.1, the feasible set of (4.3) is empty at the critical point as > 0.
CE
Therefore, the assertion is true. Next, we present the numerical applications for Algorithms 1 and 2. The codes are writ-
AC
ten in Matlab 7.0 and the tests are conducted on a DELL computer with Intel(R)Core(TM)i52400 processor (3.10 GHz+3.10 GHz) and 4.00 GB of memory. Let the sample space Ω := {ω1 , · · · , ωL } and we suppose that pj = P{ωj ∈ Ω} = 1/L, j = 1, · · · , L. We assume that the random variable ω is uniformly distributed on Ω := [0, 1]. We assume L = 200 in the following numerical tests. The parameters in Algorithm 1 are set as follows: 27
ACCEPTED MANUSCRIPT
2000
1000 M=10 M=0
1400
700
1200
600
1000 800
500 400
600
300
400
200
200
100 0
100
200
n
300
400
0
500
0
100
CR IP T
800
Iteration
1600
0
M=10 M=0
900
200
n
300
400
500
AN US
CPU Time
1800
Figure 1: CPU Time of Algorithm 1
Figure 2: Iterations of Algorithm 1 with M = 0 and M = 10
ED
M
with M = 0 and M = 10
300
50
M=10 M=0
Iteration
150
35
CE
CPU Time
200
100
AC
50
0
40
PT
250
0
100
200
M=10 M=0
45
30 25 20 15 10
n
300
400
500
5
0
100
200
n
300
400
Figure 3: CPU Time of Algorithm 2
Figure 4: Iterations of Algorithm 2
with M = 0 and M = 10
with M = 0 and M = 10
28
500
ACCEPTED MANUSCRIPT
1800
700 Algorithm 1 Algorithm 2
1600
Algorithm 1 Algorithm 2 600
1400 500
Iteration
CPU Time
1200 1000 800
400
300
200 400 100
200 0
0
100
200
300
n
400
500
0
0
100
CR IP T
600
200
n
300
400
500
AN US
Figure 5: CPU Time of Algorithms 1 and 2 Figure 6: Iterations of Algorithms 1 and 2 η = 1.5, σ = 0.5, b = 1, ¯b = 2, b = 1, ¯b = 2, and integer M = 10, θ00 = 1 and B0 = I, while the parameters in Algorithm 2 are set as follows η = 0.5, σ = 0.5, α = b = 0.5, α ¯ = ¯b = 1.5, B0 = I, M = 10. We choose = 10−5 in our numerical tests. We now solve bi-objective optimization problem (4.1) by Algorithm 1 and Algorithm
M
2 with the initial point randomly generated from the feasible set of (4.1) with n = 5, 10, 50, 100, 150, 200, 300, 500. To show the efficiency of nonmonotone gradient methods,
ED
we compare the performance of Algorithms 1 and 2 with monotone gradient methods, that is M = 0 in Algorithms 1 and 2. The number of iterations and computing CPU
PT
time in seconds are reported in Figures 1-4. Generally, Algorithms 1 and 2 in this paper outperforms monotone gradient methods on both the number of iterations and CPU
CE
time. The results suggest that our methods are efficient for all the tested problems and the theoretical assertions about the new algorithms are verified.
AC
We also compare the performance of Algorithms 1 and 2 with M = 10. The number of iterations and computing CPU time in seconds are reported in Figures 5 and 6, respectively. It can be found that Algorithm 1 outperforms Algorithm 2 on both the number of iterations and CPU time. For the choice of the parameter M , to the best of our knowledge, there is no theoretic results about how to choose it. As recommended by Grippo, Lampariello and Lucidi (1986), M = 10 is often set (Zhang and Hager, 2004). To explore the dependence on the parame29
ACCEPTED MANUSCRIPT
250
600 Algorithm 1 Algorithm 2
Algorithm 1 Algorithm 2 500
200
Iteration
CPU Time
400 150
100
300
50
0
100
0
5
10
15
M
20
25
0
30
0
5
CR IP T
200
10
15
M
20
25
30
Figure 7: CPU Time of Algorithms 1 and 2 Figure 8: Iterations of Algorithms 1 and 2 with different choices of M
AN US
with different choices of M
ter M for our methods, we solve (4.1) with n = 50 and M = 0, 1, 5, 10, 15, 20, 25, 30. The number of computing CPU time for Algorithms 1 and 2 is reported in Figure 7. The number of iterations for Algorithms 1 and 2 is reported in Figure 8. It can be shown that
5
Conclusion
ED
M
quite satisfactory results can be obtained for 5 ≤ M ≤ 15.
PT
In this paper, we study a class of vector optimization with C-convex objective function. Two nonmonotone gradient algorithms are presented. The global and local convergence
CE
of the proposed algorithms are proved. To show the efficiency of the proposed algorithm, applications to portfolio optimization under bi-criteria considerations are given. Further studies of the proposed algorithms can be extended to more practical applications under
AC
multi-criteria considerations. Acknowledgements. The work is supported by The Program for Professor of Spe-
cial Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. TP2014043). The work is also supported by Natural Scientific Foundation of China (No. 71571055). We thank the referees for the careful reading of the manuscript and constructive suggestions.
30
ACCEPTED MANUSCRIPT
References [1] BELLO CRUZ, J.Y. A subgradient method for vector optimization problems. SIAM J. Optim. 2013, 23(4), 2169-2182. ´ [2] BELLO CRUZ, J.Y. LUCAMBIO PEREZ, L.R. MELO, J.G. Convergence of the
CR IP T
projected gradient method for quasiconvex multiobjective optimization. Nonlinear Anal. 2011, 74, 5268-5273.
[3] BENTO, G.C. CRUZ NETO, J.X. SOUBEYRAN, A. A proximal point-type method for multicriteria optimization. Set-Valued Var. Anal. 2014, 22, 557-573.
AN US
[4] BONNEL, H. IUSEM, A.N. SVAITER, B.F. Proximal methods in vector optimization. SIAM J. Optim. 2005, 15, 935-970.
[5] BRITOA, A.S. CRUZ NETO, J.X. SANTOS, P.S.M. SOUZA, S.S. A relaxed projection method for solving multiobjective optimization problems. Eur. J. Oper. Res.
M
2017, 1256, 17-23.
[6] CARRIZO, G.A. LOTITO, P.A. MACIEL, M.C. Trust region globalization strategy
ED
for the nonconvex unconstrained multiobjective optimization problem. Math. Program. Ser. A 2016, 159, 339-369.
CE
330.
PT
[7] DAI, Y.H. On the nonmonotone line search. J. Optim. Theory Appl. 2002, 112, 315-
[8] EHRGOTT, M. Multicriteria optimization, Lecture Notes in Economics and Math-
AC
ematical Systems 491. Springer-Verlag, Berlin, 2000. [9] FLIEGE, J. GRANA DRUMMOND, L.M. SVAITER, B.F. Newton’s method for multiobjective optimization. SIAM J. Optim. 2009, 20(2), 602-626.
[10] FLIEGE, J. SVAITER, B.F. Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 2000, 51, 479-494.
31
ACCEPTED MANUSCRIPT
[11] FUKUDA, E.H. GRA˜ nA DRUMMOND, L.M. On the convergence of the projected gradient method for vector optimization. Optim. 2011,60(8-9), 1009-1021. [12] FUKUDA, E.H. GRA˜ nA DRUMMOND, L.M. Inexact projected gradient method for vector optimization. Comput. Optim. Appl. 2013, 54, 473-493.
CR IP T
[13] GRA˜ nA DRUMMOND, L.M. IUSEM, A.N. A projected gradient method for vector optimization problems. Comput. Optim. Appl. 2004, 28, 5-29.
[14] GRIPPO, L. Lampariello, F. Lucidi S. A nonmonotone line search technique for Newtons method. SIAM J. Numer. Anal. 1986, 23, 707-716.
AN US
[15] HANDI, J. KELL, D.B. KNOWLES, J. Multiobjective optimization in bioinformatics and computational biology. IEEE ACM T. Comput. Bi. 2007, 4, 279-290. [16] IUSEM, A.N. SAVITER, B.F. TEBOULLE, M. Entropy-like proximal methods in convex programming. Math. Oper. Res. 1994, 19, 790-814.
M
[17] JAHN, J. Vector optimization: Theory, applications, and extensions. Spring, Erlangen, 2003.
ED
[18] JI Y, LI Y.J. ZHANG K.C. ZHANG X.L. A new nonmonotone trust-region method of conic model for solving unconstrained optimization. J. Comput. Appl. Math. 2010,
PT
233, 1746-1754
CE
[19] LU, Z.S. ZHANG, Y. An augmented Lagrangian approach for sparse principal component analysis. Math. Program. 2012, 135, 149-193.
AC
[20] LUO, Z.Q. TSENG, P. Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 1993, 46, 157-178.
[21] PARDALOS, P.M. HEARN, D. Multi-criteria decision analysis via ratio and difference judgement. Kluwer Academic Publishers, Dordrecht, 1999. [22] QU, S.J. GOH, M. CHAN, F.T.S. Quasi-Newton methods for solving multiobjective optimization. Oper. Res. Lett. 2011, 39, 397-399. 32
ACCEPTED MANUSCRIPT
[23] QU, S.J. GOH, M. JI, Y. SOUZA, R.D. A new algorithm for linearly constrained c-convex vector optimization with a supply chain network risk application. Eur. J. Oper. Res. 2015, 247: 359-365. [24] QU, S.J. GOH, M. SOUZA, R.D. WANG, T.N. Proximal point algorithms for convex
Theory Appl. 2014, 163(3), 949-956.
CR IP T
multi-criteria programs with applications to supply chain risk management. J. Optim.
[25] QU, S.J. LIU, C. GOH, M. LI, Y.J. JI, Y. Nonsmooth multiobjective programming with quasi-Newton methods. Eur. J. Oper. Res. 2014, 235(3), 503-510.
[26] QU, S.J. GOH, M. ZHANG X.J. A new hybridmethod for nonlinear complementarity
AN US
problems. Comput. Optim. Appl. 2011, 49, 493-520
[27] ROCKAFELLAR, R.T. Convex analysis. Princeton University Press, Princeton, 1970.
Prog. Study 1981, 14, 206-214.
M
[28] ROBINSON, S.M. Some continuity properties of polyhedral multifunctions. Math.
ED
[29] SHI, Z.J. WANG, S.Q. Nonmonotone adaptive trust region method. Eur. J. Oper. Res. 2011, 208, 28-36.
PT
[30] TOINT, P. An assessment of nonmonotone linesearch techniques for unconstrained
CE
optimization. SIAM J. Sci. Comp. (1996), 17, 725-739. [31] TSENG, P. YUN, S. A coordinate gradient descent method for nonsmooth separable
AC
minimization. Math. Program. 2009, 117, 387-423. [32] VIEIRA, D.A.G. TAKAHASHI, R.H.C. SALDANHA, R.R. Multicriteria optimization with a multiobjective golden section line search. Math. Program. Ser. A 2012,
131, 131-161.
[33] VILLACORTA, K.D.V. OLIVEIRA, P.R. An interior proximal method in vector optimization. Eur. J. Oper. Res. 2011, 214, 485-492.
33
ACCEPTED MANUSCRIPT
[34] ZHANG, H.C. HAGER, W.W. A nonmonotone line search technique and its appli-
AC
CE
PT
ED
M
AN US
CR IP T
cation to unconstrained optimization. SIAM J. Optim. 2004, 14(4), 1043-1056.
34