Journal of Computational and Applied Mathematics 255 (2014) 765–777
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
Quasi-Newton’s method for multiobjective optimization Žiga Povalej ∗ Turboinštitut d.d., Rovšnikova 7, 1210 Ljubljana, Slovenia FMF, University of Ljubljana, Jadranska 19, 1000 Ljubljana, Slovenia
article
info
Article history: Received 26 October 2011 Received in revised form 17 May 2013 Keywords: Multiobjective optimization Newton’s method Quasi-Newton’s method Pareto optimum
abstract In this paper we present a quasi-Newton’s method for unconstrained multiobjective optimization of strongly convex objective functions. Hence, we can approximate the Hessian matrices by using the well known BFGS method. The approximation of the Hessian matrices is usually faster than their exact evaluation, as used in, e.g., recently proposed Newton’s method for multiobjective optimization. We propose and analyze a new algorithm and prove that its convergence is superlinear. © 2013 Elsevier B.V. All rights reserved.
1. Introduction In multiobjective optimization, several conflicting objectives have to be minimized simultaneously. Generally, no unique solution exists but a set of mathematically equally good solutions can be identified, by using the concept of Pareto optimality. Many applications of multiobjective optimization can be found in engineering [1–6], economics and finance [7,8], medicine [9–11], management and planning [12,13], etc. There exist many solution strategies to solve the multiobjective optimization problems. One of the basic approaches is the weighting method (see [14]), where one single-objective optimization problem is formed by weighting several objective functions. Similar problem has the ε -constraint method, introduced in [15]. Here, we minimize only the chosen objective functions and we bound the others. Some algorithms require the preordering of the functions due to their importance [16]. First, the most important function is optimized, then the second, etc. The disadvantage of these approaches is that the choice of weights, constraints, or the importance of the functions respectively, is not known in advance and has to be prespecified. Some other techniques for multiobjective optimization that do not need any a priori information were developed in recent years, e.g., the steepest descent algorithm, studied in [17–19] with at most linear convergence. Other methods depend on heuristic, especially evolutionary approaches. Efficient evolutionary algorithms can be found in [20]. Unfortunately, there exist no convergence proofs and the empirical convergence is quite slow. For a survey on other multiobjective approaches see, e.g., [21,22]. Recently, a parameter-free optimization method for unconstrained multiobjective optimization was developed [23]. It borrows the idea of the Newton’s method for single-objective optimization. The necessary assumption is that the objective functions are twice continuously differentiable but no other parameters or ordering of the functions are needed. The authors show that the rate of convergence is at least superlinear and it is quadratic if the second derivatives are Lipschitz continuous. In this paper, we present a similar multiobjective optimization method, which approximates the second derivative matrices instead of evaluating them. Therefore, the time for one step reduces. The rate of convergence is proven to be superlinear. This concept is analogous to quasi-Newton’s method for single-objective optimization.
∗
Correspondence to: Turboinštitut d.d., Rovšnikova 7, 1210 Ljubljana, Slovenia. Tel.: +386 1 5820 154; fax: +386 1 5820 112. E-mail address:
[email protected].
0377-0427/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cam.2013.06.045
766
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
The rest of the paper is as follows. Section 2 introduces the problem, notation and preliminaries. Section 3 derives a direction search program, solved by Karush–Kuhn–Tucker multipliers. Next, we present the algorithm considered. Section 4 establishes some theoretical results. They are needed in Section 5, which contains the main convergence result. The outcomes of numerical simulations and commentaries are presented in Section 6, which is followed by conclusions in Section 7. 2. Notation and preliminaries Let us start with some notation. Let R be the set of real numbers, R+ (R− ) the set of strictly positive (negative) real numbers and N the set of positive integers. Assume that U ⊆ Rn , n ∈ N, is an open set. Furthermore, let an objective function F : U −→ Rm , where m ≥ 2, be given. Function F = (f1 , f2 , . . . , fm ) is a vector function which components are scalar functions fj : U −→ R,
j = 1, 2, . . . , m.
Note that n and m are independent in general. Throughout the paper we will assume that F ∈ C 2 (U ), i.e., F is twice continuously differentiable on U. For x ∈ U, let ∇ fj (x) ∈ Rn denote the gradient of fj at x for all j = 1, 2, . . . , m. The matrix JF (x) ∈ Rm×n is the Jacobian matrix of F at x, i.e., the j-th row of JF (x) is ∇ fj (x)T for all j = 1, 2, . . . , m. Let ∇ 2 fj (x) be the Hessian matrix of fj at x, again for all j = 1, 2, . . . , m. By Im(M ) we denote the range of a matrix M ∈ Rm×n and the identity matrix is I ∈ Rn×n . For matrices M , N ∈ Rn×n we write M > N if M − N is positive definite. The Euclidean norm in Rn will be denoted by ∥ · ∥. We use the same notation for the induced operator norms on the corresponding matrix spaces. Let B(x, r ) be a closed ball with a center x ∈ Rn and radius r ∈ R+ . We shall assume strong convexity for fj on U for all j = 1, 2, . . . , m. Function f : U −→ R, f ∈ C (U ), is strongly convex if for all x, y ∈ U 2
(∇ f (x) − ∇ f (y ))T (x − y ) ≥ a∥x − y ∥2 ,
(1)
for some a > 0 (see [24]). It is easy to see that (1) is equivalent to
∇ 2 f (x) ≥ aI ,
for all x ∈ U .
Strong convexity implies strict and usual convexity. Hence, if fj are strongly convex, Hessian matrices ∇ 2 fj (x) are positive definite for all x ∈ U and for all j = 1, 2, . . . , m. Our task is to find a Pareto optimum of the objective function F . Definition 1. A point x∗ ∈ U is a Pareto optimum, if there is no y ∈ U for which F (y ) ≤ F (x∗ ) and
F (y ) ̸= F (x∗ )
holds. The point x∗ ∈ U is a weak Pareto optimum if there is no y ∈ U such that F (y ) < F (x∗ ). The inequality signs ≤ and < are understood in a componentwise sense. Clearly every Pareto optimum is also weak Pareto optimum. Sometimes there exist points which are Pareto optima only on a restricted subset of U. Definition 2. A point x∗ ∈ U is a local Pareto optimum if there exists a neighborhood V ⊆ U of x∗ such that the point x∗ is a Pareto optimum for F restricted on V . Similarly, a point x∗ ∈ U is a local weak Pareto optimum if there exists a neighborhood V ⊆ U of x∗ such that the point x∗ is a weak Pareto optimum for F restricted on V . Let us introduce a necessary condition for Pareto optimality first. A point x∗ ∈ U is stationary point of F if Im(JF (x∗ )) ∩ Rm − = ∅.
(2)
This definition was first used in [17]. Note that for m = 1, (2) reduces to the classical condition ∇ f (x ) = 0. If x is a stationary point of F , then from (2) follows that for all p ∈ Rn there exists j0 ∈ {1, 2, . . . , m} such that ∇ fj0 (x∗ )T p ≥ 0. If x ∈ U is not stationary, then there exists p ∈ Rn such that ∇ fj (x)T p < 0 for all j = 1, 2, . . . , m. Function F is continuously differentiable, therefore, ∗
lim
α→0
fj (x + α p) − fj (x)
α
= ∇ fj (x)T p < 0,
j = 1, 2, . . . , m.
∗
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
767
Hence, p is a descent direction of F at x, i.e., there exists α0 > 0 such that F (x + α p) < F (x)
for all α ∈ (0, α0 ].
The following theorem presents a relation between Pareto optimality and stationarity. The proof can be found in [23]. Theorem 1. Suppose that F ∈ C 1 (U ). (a) If x∗ is local weak Pareto optimum, then x∗ is a stationary point of F . (b) If U is convex, F is Rm -convex and x∗ ∈ U is a stationary point of F , then x∗ is a weak Pareto optimum. (c) If U is convex, F ∈ C 2 (U ), ∇ 2 fj (x) > 0 for all x ∈ U and all j = 1, 2, . . . , m, and if x∗ is a stationary point of F , then x∗ is a Pareto optimum. The most popular quasi-Newton’s method for one-objective optimization is BFGS (Broyden, Fletcher, Goldfarb and Shanno) method [25–28]. It is a line search method with a descent direction p(k) = −(B(k) )−1 ∇ f (x(k) ), where f ∈ C 1 (U ) and B(k) ∈ Rn×n is a positive definite symmetric matrix, updated at every iteration. The new iterate is x(k+1) = x(k) + α (k) p(k) , with the step length α (k) ∈ R+ . The BFGS update formula is given by B(k) s(k) s(k)T B(k) y (k) y (k)T B(k+1) = B(k) − + (k)T (k) , ( k ) T ( k ) ( k ) s B s s y
(3)
where s(k) = x(k+1) − x(k) = α (k) p(k) and y (k) = ∇ f (x(k+1) ) − ∇ f (x(k) ). As the authors show in [24], B(k+1) remains positive definite whenever B(k) is positive definite. Two conditions must be satisfied for the BFGS method to be successful. The first one is a secant equation B(k+1) s(k) = y (k) , which we obtain by multiplying (3) by s(k) . The second necessary condition is the curvature condition s(k)T y (k) > 0. If f is strongly convex, the curvature condition is satisfied for any two points x(k) and x(k+1) . 3. The quasi-Newton’s method In this section we will define the quasi-Newton’s direction for the multiobjective problem. We use similar definitions as were used in [23] for the Newton’s direction. Instead of using exact Hessian matrices, we approximate them by BFGS method. Direction p(x) is a quasi-Newton’s direction for x ∈ U if it is the optimal solution of the problem min
max
1
p∈Rn j=1,2,...,m
∇ fj (x)T p + pT Bj (x)p,
(4)
2
where Bj (x) is some approximation of ∇ 2 fj (x), j = 1, 2, . . . , m. Here, we use a BFGS update formula (3). The solution and optimal value will be denoted by 1
τ (x) = minn max ∇ fj (x)T p + pT Bj (x)p, p∈R j=1,2,...,m
p(x) = arg min
2
max
1
p∈Rn j=1,2,...,m
∇ fj (x)T p + pT Bj (x)p. 2
(5) (6)
The problem (4) can be formulated as a convex quadratic optimization problem and is equivalent to
min s.t.
g (t , p) = t ,
1
∇ fj (x)T p + pT Bj (x)p − t ≤ 0, 2 (t , p) ∈ R × Rn .
j = 1, 2, . . . , m,
The Lagrangian of the problem is L((t , p), λ) = t +
m j =1
1 λj ∇ fj (x)T p + pT Bj (x)p − t . 2
(7)
768
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
From Karush–Kuhn–Tucker (KKT) conditions we obtain that there exists λ = [λ1 , λ2 , . . . , λm ]T , such that m
λj ∇ fj (x) + Bj (x)p = 0,
(8)
j =1
λj ≥ 0, m
j = 1, 2, . . . , m,
(9)
λj = 1,
(10)
j =1
1
∇ fj (x)T p + pT Bj (x)p ≤ t , j = 1, 2, . . . , m, 2 1 λj ∇ fj (x)T p + pT Bj (x)p − t = 0, j = 1, 2, . . . , m.
(11) (12)
2
The solution (p(x), τ (x)) of (7) is unique and the problem has a Slater point (1, 0). Hence, there exist KKT multipliers λj = λj (x) for all j = 1, 2, . . . , m. Together with p = p(x) and t = τ (x), they satisfy (8)–(12). The Eq. (8) yields −1 m m λj (x)Bj (x) λj (x)∇ fj (x). (13) p(x) = − j =1
j =1
The stationarity of x and some other properties of τ are given in the following lemma. Lemma 2. Let Bj (x) > 0 for all x ∈ U and consider τ as defined by (5). Then: (a) For all x ∈ U, τ (x) ≤ 0. (b) The following conditions are equivalent 1. The point x is not stationary. 2. τ (x) < 0. 3. p(x) ̸= 0. (c) The function τ is continuous. Proof. The proof of items (a) and (b) can be found in [23], where we replace Hessian matrices ∇ 2 fj (x), j = 1, 2, . . . , m, by their approximations Bj (x), j = 1, 2, . . . , m, which are positive definite by construction. Now, we prove item (c). It is sufficient to show continuity of τ in a fixed arbitrary chosen compact set W ⊂ U. Due to (a), for any x ∈ U, 1 2
p(x)T Bj (x)p(x) ≤ −∇ fj (x)T p(x),
(14)
holds for all j = 1, 2, . . . , m. Function F is twice continuously differentiable and approximations Bj (x) are positive definite for all x ∈ W and j = 1, 2, . . . , m. Therefore, the eigenvalues of Bj (x) are uniformly bounded away from zero on W for all j = 1, 2, . . . , m. So, there exist K , L ∈ R+ , such that K =
max
x∈W ,j=1,2,...,m
∥∇ fj (x)∥ and L =
min
x∈W ,∥e∥=1,j=1,2,...,m
eT Bj (x)e.
We combine (14) with definitions of K and L. Using the Cauchy–Schwarz inequality we get 1 2
L∥p(x)∥2 ≤ ∥∇ fj (x)∥∥p(x)∥ ≤ K ∥p(x)∥,
for all x ∈ W and all j = 1, 2, . . . , m. Hence,
∥p(x)∥ ≤ 2
K L
for all x ∈ W , i.e., the Newton’s directions are uniformly bounded on W . Define a family of functions {ϕx,j }x∈W , j=1,2,...,m , where
ϕx,j :
W −→ R z −→ ∇ fj (z )T p(x) +
1 2
p(x)T Bj (z )p(x).
We will show, that this family of functions is uniformly equicontinuous. First, define an open cover {B(z , δz )}z ∈W with radii δz ∈ R+ . For any z ∈ W a ball B(z , δz ) is formed in a way that for all small εz ∈ R+ there exists some δz ∈ R+ such that for all w ∈ B(z , δz ) inequalities εz εz ∥Bj (w ) − ∇ 2 fj (z )∥ < and ∥∇ 2 fj (w ) − ∇ 2 fj (z )∥ < 2
2
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
769
hold for all j = 1, 2, . . . , m. The second inequality is true because of continuity of Hessian matrices. Since W is a compact space, there exists a finite subcover of {B(z , δz )}z ∈W , say {B(zi , δzi ); i = 1, 2, . . . , l}. We write 1
ϕx,j (z ) = ∇ fj (z )T p(x) + p(x)T Bj (z )p(x) 2 1
1 = ∇ fj (z ) p(x) + p(x)T ∇ 2 fj (z )p(x) + p(x)T Bj (z ) − ∇ 2 fj (z ) p(x). T
2
2
(15)
Since the first two terms of (15) are continuous, only the last term must be considered. We take w1 , w2 ∈ W such that ∥w1 − w2 ∥ < δ for some small δ ∈ R+ . Then there exist indices i1 , i2 ∈ {1, 2, . . . , l}, such that wi ∈ B(zik , δzik ), k = 1, 2. Therefore,
1 p(x)T (Bj (w1 ) − ∇ 2 fj (w1 ))p(x) − 1 p(x)T (Bj (w2 ) − ∇ 2 fj (w2 ))p(x) 2 2 ≤ ≤
1 2 1 2
∥p(x)∥2 ∥Bj (w1 ) − ∇ 2 fj (zi1 )∥ + ∥∇ 2 fj (zi1 ) − ∇ 2 fj (w1 )∥ + ∥Bj (w2 ) − ∇ 2 fj (zi2 )∥ + ∥∇ 2 fj (zi2 ) − ∇ 2 fj (w2 )∥ ∥p(x)∥2 (εzi1 + εzi2 )
and ϕx,j is uniformly continuous for all x ∈ W and all j = 1, 2, . . . , m. It is obvious that the family {ϕx,j }x∈W , j=1,2,...,m is uniformly equicontinuous. Hence, the family {φx = maxj=1,2,...,m ϕx,j }x∈W is also uniformly equicontinuous. Next, state the definition of uniformly equicontinuous function family. Take ε ∈ R+ . There exist δ ∈ R+ such that for all y , z ∈ W , ∥y − z ∥ < δ implies |φx (y ) − φx (z )| < ε for all x ∈ W . Thus, for ∥y − z ∥ < δ ,
τ (z ) ≤
max
j=1,2,...,m
1
∇ fj (z )T p(y ) + p(y )T Bj (z )p(y ) = φy (z ) 2
≤ φy (y ) + |φy (z ) − φy (y )| < τ (y ) + ε. Hence, τ (z ) − τ (y ) < ε . If we interchange the roles of y and z, we get |τ (z ) − τ (y )| < ε which proves the continuity of τ . Having defined the unique descent direction p(x), we would like to choose an appropriate step length α . We use the classical Armijo condition for the scalar function f : U −→ R, presented in [24]. The Armijo condition for the quasi-Newton’s search direction p(x) reads f (x + α p(x)) ≤ f (x) + bα∇ f (x)T p(x), where b ∈ (0, 1). We want to accept a full quasi-Newton’s step (α = 1) close to a local optimum, where B(x) is a good approximation for ∇ 2 f (x). This is possible when B(x) > 0 and b ∈ (0, 1/2), according to [29]. For the scalar case
τ (x) =
1 2
∇ f (x)T p(x).
Hence, we rewrite the Armijo condition as f (x + α p(x)) ≤ f (x) + c ατ (x), where c = 2b ∈ (0, 1). The last inequality (componentwise) will be a criterion for accepting a step in the multiobjective quasi-Newton’s direction. The authors proved the next theorem in [23]. Theorem 3. Assume that x ∈ U is a nonstationary point of F . Then for any 0 < c < 1 there exists α0 ∈ (0, 1] such that x + α p(x) ∈ U
and
fj (x + α p(x)) ≤ fj (x) + c ατ (x)
hold for all α ∈ [0, α0 ] and j = 1, 2, . . . , m. To simplify the notation we will skip the argument x(k) and write C (k) = C (x(k) ) for every quantity C depending on x(k) , (k) e.g., ∇ fj = ∇ fj (x(k) ). Now, we are ready to present the quasi-Newton’s algorithm for multiobjective problem. At each step we solve the quasiNewton’s direction problem (4) and obtain a quasi-Newton’s direction (6) which is a descent direction. Next, we determine the step length by the modified Armijo conditions (Theorem 3). At the end we update the point and Hessian approximates with (3).
770
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777 Quasi-Newton’s Algorithm for Multiobjective Optimization — QNMO 1. Initialization: Choose x(0) ∈ U, symmetric positive definite matrix B(0) ∈ Rn×n , c ∈ (0, 1) and a small value ε > 0. Define I = {1/2n ; n = 0, 1, 2, . . .}. 2. Main loop: (a) Descent direction: Solve the quasi-Newton’s direction problem (4). Obtain p(k) and τ (k) as in (6) and (5). (b) Termination: If τ (k) > −ε , then stop. Else, proceed to step 2.(c). (c) Line search: Choose α (k) as the largest α ∈ I such that x(k) + α p(k) ∈ U, (k)
fj (x(k) + α p(k) ) ≤ fj + c α τ (k) , (d) Update: Define x(k+1) = x(k) + α (k) p(k) .
j = 1, 2, . . . , m.
(k+1)
Update Bj as in (3), j = 1, 2, . . . , m. Set k := k + 1. Go to step 2.(a).
4. Technical results Before we start with the analysis of convergence properties, we recall some results, which confirm the superlinear convergence of the BFGS method. It is shown in [30] that
∥ B(k) − ∇ 2 f (x∗ ) s(k) ∥ lim = 0, k→∞ ∥s(k) ∥ where f ∈ C 2 (U ), ∇ f (x∗ ) = 0, ∇ 2 f (x∗ ) is positive definite for x∗ = limk→∞ x(k) , s(k) = x(k+1) − x(k) = α (k) p(k) and B(k) is a BFGS approximation at iteration k. Unfortunately, this result cannot be used in our case, since there is no solution x∗ for which ∇ fj (x∗ ) = 0 holds for all j = 1, 2, . . . , m. Numerical results show that
(k) (k) p(k) B j − ∇ 2 fj ∥p(k) ∥
−→ 0,
when k → 0, j = 1, 2, . . . , m.
Therefore, we state an assumption that for some small ε ∈ R+ there exists k0 ∈ N such that for all k ≥ k0
(k) (k) p(k) B j − ∇ 2 fj ∥p(k) ∥
< ε,
j = 1, 2, . . . , m.
First, recall Lemma 4.1 from [23], where the authors have estimated the error of approximating ∇ fj by its linear and fj by its quadratic local models, j = 1, 2, . . . , m. To simplify later results, we use ε/2 instead of ε . Lemma 4. Suppose that V ⊂ U is a convex set. Let δ, ε ∈ R+ be such that for any x, y ∈ V , with ∥y − x∥ < δ , then
∥∇ 2 fj (y ) − ∇ 2 fj (x)∥ <
ε 2
,
j = 1, 2, . . . , m
(16)
follows. Under this assumptions, for any x, y ∈ V such that ∥y − x∥ < δ we have that
ε ∥∇ fj (y ) − ∇ fj (x) + ∇ 2 fj (x)(y − x) ∥ < ∥y − x∥,
(17)
fj (y ) − fj (x) + ∇ fj (x)T (y − x) + 1 (y − x)T ∇ 2 fj (x)(y − x) < ε ∥y − x∥2 . 4 2
(18)
2
and
Now, we modify Lemma 4 to estimate the error of approximations, where we use the BFGS approximation of the Hessian.
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
771
Lemma 5. Let V ⊂ U be a convex subset and δ, ε ∈ R+ be small constants such that for any ∥x, y ∥ ∈ V , with y − x < δ , (16) follows. Let {x(k) } be a sequence generated by QNMO and {B(j k) }, j = 1, 2, . . . , m, sequences of BFGS updates. We assume that for ε there exists k0 ∈ N such that for all k ≥ k0 2 (k) (k) ∇ fj − Bj (y − x(k) ) ε < , j = 1, 2, . . . , m. ∥(y − x(k) )∥ 2 Under this assumptions, for any x(k) , k ≥ k0 , and any y ∈ V such that ∥y − x(k) ∥ < δ we have
(k) (k) ∇ fj (y ) − ∇ fj + Bj (y − x(k) ) < ε∥y − x(k) ∥,
(19)
fj (y ) − f (k) + ∇ f (k)T (y − x(k) ) + 1 (y − x(k) )T B(k) (y − x(k) ) < ε ∥y − x(k) ∥2 j j j 2 2
(20)
and
for j = 1, 2, . . . , m. Proof. By (17) we have
∥∇ fj (y ) − ∇ fj(k) + B(j k) (y − x(k) ) ∥ ≤ ∥∇ fj (y ) − ∇ fj(k) − ∇ 2 fj(k) (y − x(k) )∥ + ∥ ∇ 2 fj(k) − B(j k) (y − x(k) )∥ ε
<
2
ε ∥y − x(k) ∥ + ∥y − x(k) ∥ = ε∥y − x(k) ∥, 2
so (19) holds. Similarly, by (18),
fj (y ) − f (k) + ∇ f (k)T (y − x(k) ) + 1 (y − x(k) )T B(k) (y − x(k) ) j j j 2 1 1 ≤ fj (y ) − fj(k) − ∇ fj(k)T (y − x(k) ) − (y − x(k) )T ∇ 2 fj(k) (y − x(k) ) + (y − x(k) )T ∇ 2 fj(k) − B(j k) (y − x(k) ) <
ε
4
(k) 2
ε
2
(k) 2
∥y − x ∥ + ∥y − x ∥ = 4
ε
2
2
(k) 2
∥y − x ∥ ,
which confirms (20) and the proof is complete.
The next technical result provide estimates on τ . Lemma 6. Let x ∈ U and a, b ∈ R+ such that a ≤ b. If aI ≤ Bj (x) ≤ bI ,
j = 1, 2, . . . , m,
then the following two inequalities hold, (a) (a/2)∥p(x)∥2 ≤ |τ (x )| ≤ (b/2)∥p(x)∥2 , m m (b) |τ (x)| ≤ (1/(2a)) ∥ j=1 λj ∇ fj (x)∥2 for all λj ≥ 0, j = 1, 2, . . . , m, with j=1 λj = 1. We omit the proof of this result. The reader should follow the proofs of Lemmas 4.2 and 4.3 of [23] replacing Hessian matrices ∇ 2 fj (x) by BFGS approximations Bj (x), j = 1, 2, . . . , m. 5. Convergence Now, we present the main theoretical result of this paper. In the following theorem we give sufficient conditions for local convergence. Theorem 7. Let {x(k) } denote a sequence generated by QNMO presented in Section 3. We assume V ⊂ U is a convex set, c ∈ (0, 1), k0 ∈ N, a, b, r , δ, ε ∈ R+ and (i) aI ≤ Bj (x) ≤ bI and aI ≤ ∇ 2 fj (x) ≤ bI for all x ∈ V , j = 1, 2, . . . , m, 2 2 (ii) ∥∇ fj (y ) − ∇ fj(x)∥ < ε/2for all x, y ∈ V with ∥y − x∥ < δ, j = 1, 2, . . . , m, (k)
(iii) Bj
− ∇ 2 fj(k) (y − x(k) ) < (ε/2)∥y − x(k) ∥ for all k ≥ k0 , y ∈ V , j = 1, 2, . . . , m,
(iv) B(x(k0 ) , r ) ⊂ V , (v) ε/a ≤ 1 − c, (vi) ∥p(k0 ) ∥ < min{δ, r (1 − ε/a)}.
772
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
Then, for all k ≥ k0 , (a) (b) (c) (d)
∥x(k) − x(k0 ) ∥ ≤ (1 − (ε/a)k−k0 )/(1 − ε/a) ∥p(k0 ) ∥, ∥p(k) ∥ ≤ (ε/a)k−k0 ∥p(k0 ) ∥, α (k) = 1, ∥p(k+1) ∥ ≤ (ε/a) ∥p(k) ∥,
and the sequence {x(k) } converges to some local Pareto optimum x∗ ∈ Rm . The convergence is superlinear. Proof. Let us first assume that (a) and (b) hold for some k. We will prove that also (c) and (d) is true. By (a), (b) and triangle inequality 1 − (ε/a)k+1
∥(x(k) + p(k) ) − x(k0 ) ∥ ≤
1 − ε/a
∥p(k0 ) ∥.
(21)
Then, from assumptions (v) and (vi), x(k) , x(k) + p(k) ∈ B(x(k0 ) , r )
and ∥(x(k) + p(k) ) − x(k) ∥ < δ
follows. Now we use Eq. (20) of Lemma 5 and assumptions (ii) and (iii). Then, for all j = 1, 2, . . . , m, we have fj (x(k) + p(k) ) ≤ fj
≤ fj = fj Inequality τ
(k)
1 ε + ∇ fj(k)T p(k) + p(k)T B(j k) p(k) + ∥p(k) ∥2
(k) (k)
+τ
(k)
(k)
+ cτ
ε
2
2
(k) 2
+ ∥p ∥ 2
(k)
ε + (1 − c )τ (k) + ∥p(k) ∥2 . 2
≤ 0 ((a) of Lemma 2) together with (a) of Lemma 6 and assumptions (i) and (v) imply
ε −a(1 − c ) + ε (k) 2 (1 − c )τ (k) + ∥p(k) ∥2 ≤ ∥p ∥ ≤ 0. 2
2
Combining the above inequalities leads to fj (x(k) + p(k) ) ≤ fj
(k)
+ c τ (k)
for all j = 1, 2, . . . , m. Therefore, Armijo conditions hold for α (k) = 1 and we have proved (c). We set x(k+1) := x(k) + p(k) ,
x(k) , x(k+1) ∈ B(x(k0 ) , r ), ∥x(k+1) − x(k) ∥ < δ.
Next, define v (k+1) :=
m
λ(j k) ∇ fj(k+1)
j =1
and use (b) of Lemma 6 with (9) and (10). Therefore, |τ (k+1) | ≤ (1/(2a))∥v (k+1) ∥2 . Now we would like to estimate ∥v (k+1) ∥. For x ∈ U we define G(k) (x) :=
m
λ(j k) fj (x) and H (k) :=
j =1
m
λ(j k) Bj(k) ,
j =1
(k)
where λj , j = 1, 2, . . . , m, are the KKT multipliers of (7). It is easy to see that
∇ G(k) (x) =
m
λ(j k) ∇ fj (x) and ∇ 2 G(k) (x) =
j =1
m
λ(j k) ∇ 2 fj (x).
j =1
Then v (k+1) = ∇ G(k) (x(k+1) ) and from (13) we rewrite p(k) = − H (k)
−1
∇ G(k) (x(k) ).
(22)
By assumptions (ii) and (iii)
ε ∥∇ 2 G(k) (y ) − ∇ 2 G(k) (x)∥ < , 2 ε ∥ H (k) − ∇ 2 G(k) (x(k) ) (y − x(k) )∥ < ∥y − x(k) ∥ 2
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
773
hold for all x, y ∈ V with ∥y − x∥ < δ and k ≥ k0 . This allows us to use inequality (19) from Lemma 5 to get
∥∇ G(k) (x(k) + p(k) ) − ∇ G(k) (x(k) ) + H (k) p(k) ∥ < ε ∥p(k) ∥. Note that (22) is equivalent to ∇ G(k) (x(k) ) + H (k) p(k) = 0 which leads to ∥v (k+1) ∥ = ∥∇ G(k) (x(k+1) )∥ < ε ∥p(k) ∥ and
|τ (k+1) | ≤
1
∥v (k+1) ∥2 <
2a
ε2 2a
∥p(k) ∥2 .
Now, we use (a) of Lemma 6 with the assumption (i) and we get (a/2) ∥p(k+1) ∥2 < (ε 2 /(2a)) ∥p(k) ∥2 . Hence,
∥p(k+1) ∥ <
ε a
∥p(k) ∥
and item (d) holds. The fact that (a) and (b) hold for all k will now be proven by induction. For k = k0 , they hold trivially. We have shown that if items (a) and (b) hold for some k, then items (c) and (d) also hold for the same k. Item (c) implies x(k+1) = x(k) + p(k) and item (a) holds for k + 1 because of (21). Item (b) for k + 1 follows directly from item (d) and our inductive hypothesis. Therefore, for all k and l, where k > l ≥ k0 ,
∥x(k) − x(l) ∥ ≤
k−1
∥x(i+1) − x(i) ∥ ≤
k−1
∥p(i) ∥ ≤ ∥p(k0 ) ∥
i=l
i=l
i=l
k−1 (ε/a)i−k0 ,
(k)
where the last inequality follows from item (b). Hence, {x
} is a Cauchy sequence, and there exists x∗ ∈ Rn , such that
lim x(k) = x∗ .
k→∞
Because of item (b) and the assumption (v), {∥p(k) ∥} converges to 0. By (i) and (a) of Lemma 6 we get that τ (k) also converges to 0 as k → ∞. Since τ is a continuous function by item (c) of Lemma 2, we can interchange the limit and the function. Hence, τ (x∗ ) = 0. Therefore, in view of (b) of Lemma 2, x∗ is stationary for F . So, from (c) of Theorem 1, x∗ is Pareto optimum for F restricted on V and hence local Pareto optimum. Finally, we prove superlinear convergence of {x(k) }. First, we define r (k) = ∥p(k0 ) ∥
(ε/a)k−k0 1 − ε/a
and δ (k) = ∥p(k0 ) ∥ (ε/a)k−k0 .
By triangle inequality, assumptions (iv), (vi) and item (a) we get B(x(k) , r (k) ) ⊂ B(x(k0 ) , r ) ⊂ V .
(23)
Choose any ϕ ∈ R+ and define
ε˜ = min a
ϕ ,ε . 1 + 2ϕ
For k ≥ k0 inequalities
∥∇ 2 fj (y ) − ∇ 2 fj (x)∥ <
ε˜ 2
for all x, y ∈ B(x(k) , r (k) ) with ∥y − x∥ < δ (k)
(24)
and
∥(B(j l) − ∇ 2 fj(l) )(y − x(l) )∥ <
ε˜ 2
for all y ∈ B(x(k) , r (k) ) and l ≥ k
(25)
hold, both for j = 1, 2, . . . , m. Assumptions (i)–(vi) are satisfied for ε˜ , r (k) , δ (k) and x(k) instead of ε , r, δ and x(0) , respectively. This is true, since (i) and (iv) follow from (23), (ii) and (iii) are inequalities (24) and (25), respectively, (v) follows from the fact that (v) holds for ε , a and c. Assumption (vi) follows from the definitions of r (k) , δ (k) and ε˜ . We use (a) and index l instead of k. Therefore,
∥x(l) − x(k) ∥ ≤ ∥p(k) ∥
1 − (˜ε /a)l−k 1 − ε˜ /a
Let l → ∞ and we get
∥x∗ − x(k) ∥ ≤ ∥p(k) ∥
1 1 − ε˜ /a
.
.
774
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
Using the last inequality and item (d) we have
∥x∗ − x(k+1) ∥ ≤ ∥p(k+1) ∥
1 1 − ε˜ /a
≤ ∥p(k) ∥
ε˜ /a 1 − ε˜ /a
.
(26)
By triangle inequality and (26)
∥x∗ − x(k) ∥ ≥ ∥x(k+1) − x(k) ∥ − ∥x∗ − x(k+1) ∥ ε˜ /a ≥ ∥p(k) ∥ − ∥p(k) ∥ 1 − ε˜ /a = ∥p(k) ∥
1 − 2 ε˜ /a 1 − ε˜ /a
.
(27)
Observe the definition of ε˜ and obtain 1 − 2ε˜ /a > 0 and 1 − ε˜ /a > 0. From this fact, (26) and (27), we arrive at
∥x∗ − x(k+1) ∥ ≤ ∥x∗ − x(k) ∥
ε˜ /a 1 − 2 ε˜ /a
.
Combine this inequality with the definition of ε˜ and conclude
∥x∗ − x(k+1) ∥ ≤ ϕ ∥x∗ − x(k) ∥. Since ϕ ∈ R+ is arbitrary chosen, sequence {x(k) } converges superlinearly to x∗ .
6. Numerical results An algorithm QNMO, presented in Section 3, was implemented in MATLAB and tested on some test problems known from the literature. All tests were completed under same conditions. For each test problem we use box constraints of the form L ≤ x ≤ U . These constraints are considered under the direction search problem (7) in the way that the new point lies in the same box, i.e., L ≤ x + p ≤ U holds. We use the stopping criterion τ (x(k) ) ≥ −ε , where ε ∈ R+ . For all tests cases we used the same value of ε = 5 × eps1/2 for the stopping criterion, where eps denotes the computer precision. The maximum number of iterations was set to 500 and to satisfy the Armijo condition, the value c = 0.1 was used. Considered test problems are presented in Table 1. The first column represents the name of the test problem. We use abbreviation of authors’ names and the number of the problem in the corresponding reference, indicated in the second column. Many problems are constructed in such a way that the dimension n of the variable space can be chosen arbitrary. The same is true for lower and upper bounds, denoted by L and U respectively. The choices of n, L and U are shown in (0) the next three columns. Starting approximations Bj , j = 1, 2, . . . , m, also affect on the number of iterations and function (0)
(0)
(0)
evaluations. Here, we use the identity matrix Bj = I or Bj = ∇ 2 fj , j = 1, 2, . . . , m. We compare the results of QNMO method with different starting approximations with the results of Newton’s Algorithm for Multicriteria, presented in [23]. The choice of starting approximation or the Newton’s method (abbreviated with NMO) is shown under column approx. All problems were solved 100 times. Starting points were randomly chosen from a uniform distribution between lower and upper bounds. The results, average number of iterations and average number of function evaluations, are stated in the last two columns of Table 1, denoted by iter and eval, respectively. Next, we discuss the test problems and the corresponding results. Problem SP is a strongly convex quadratic bicriteria test problem with 2 variables. It presents no problem to methods (0) (0) proposed here. The Newton’s method and the quasi-Newton’s method with Bj = ∇ 2 fj , j = 1, 2, . . . , m, converge in (0)
one step and the quasi-Newton’s method with Bj = I needs more steps to converge, since vector p(0) , obtained by (13), is not necessarily the best search direction. The results are presented in Fig. 1. Observe, that the method is also robust for extensions of feasible region, determined by L and U . Similarly, test problem JOS1 is a strongly convex quadratic bicriteria test problem where the number of variables n can be arbitrary chosen. Its Hessian matrices are simply ∇ 2 fj (x) = (1/n) I, j = 1, 2. Therefore, methods converge after at most two steps and poses no problem for different dimensions of variable space and feasible regions. Since the Hessian matrices are simple to evaluate, the evaluation is faster than their approximation updating as one can see in the second and third columns of Table 2. Hence, the Newton’s method is faster than quasi-Newton’s for this test problem. A representative of non-convex problem is PNR. Here, the methods find the set of global and the set of local Pareto optimal points as presented in Fig. 2. Problem JOS4 from [32] and problems ZDT1–ZDT4 from [34] were also tested. The first objective function of these problems is f1 (x) = x1 and its gradient is ∇ f1 (x) = [1, 0, . . . , 0]T . Since y (k) = ∇ f (k+1) − ∇ f (k) = 0, some objective functions are not strongly convex and the last term of the update formula (3) cannot compute correctly. Therefore, these problems cannot be solved by QNMO.
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
775
Table 1 Test problems with their main data and results. LT
UT
approx
iter
eval
2
−[5, 5]
[5, 5]
NMO I
2.00 4.34
2.00 5.15
[31]
2
−[50, 50]
[50, 50]
∇ 2 fj NMO I
2.00 2.00 5.20
2.00 2.00 6.25
[31]
2
−[100, 100]
[100, 100]
∇ 2 fj NMO I
∇ 2 fj(0)
2.00 2.00 5.67
2.00 2.00 6.65
2.00
2.00
NMO I
2.00 3.00
2.00 3.00
Name
Source
SP
[31]
SP
SP
n
(0)
(0)
JOS1
[32]
50
−[2, . . . , 2]
[2, . . . , 2]
JOS1
[32]
100
−[2, . . . , 2]
[2, . . . , 2]
∇ 2 fj NMO I
2.00 2.00 3.00
2.00 2.00 3.00
JOS1
[32]
200
−[2, . . . , 2]
[2, . . . , 2]
∇ 2 fj NMO I
2.00 2.00 3.00
2.00 2.00 3.00
JOS1
[32]
100
−[50, . . . , 50]
[50, . . . , 50]
∇ 2 fj NMO I
2.00 2.00 3.00
2.00 2.00 3.00
JOS1
[32]
100
−[100, . . . , 100]
[100, . . . , 100]
∇ 2 fj NMO I
∇ 2 fj(0)
2.00 2.00 3.00
2.00 2.00 3.00
2.00
2.00
NMO I
∇ 2 fj(0)
2.61 4.32
9.19 9.60
2.96
8.96
NMO I
3.34 7.14
3.34 7.17
(0)
(0)
(0)
(0)
PNR
[33]
2
−[2, 2]
[2, 2]
FDS
[23]
10
−[2, . . . , 2]
[2, . . . , 2]
FDS
[23]
50
−[2, . . . , 2]
[2, . . . , 2]
∇ 2 fj NMO I
4.04 3.76 9.52
4.04 3.76 9.56
FDS
[23]
100
−[2, . . . , 2]
[2, . . . , 2]
∇ 2 fj NMO I
5.18 3.76 9.27
5.18 3.76 9.27
4.84
4.84
(0)
(0)
∇ 2 fj(0)
(0)
Fig. 1. Objective space of problem SP solved by method QNMO with Bj
= I. Asterisks present Pareto optimal points.
A three-criteria problem FDS is a strongly convex problem. The results are visualized in Fig. 3. Here, the updating of the Hessian approximation is faster than its evaluation as proposed in the last two columns of Table 2.
776
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
(0 )
Fig. 2. Objective space of problem PNR solved by method QNMO with Bj locally Pareto optimal points.
= I. Asterisks present Pareto optimal points and circles with asterisks present
(0)
Fig. 3. The triangulation of the solution set in the objective space for problem FDS solved by method QNMO with Bj
= I.
Table 2 An averaged elapsed time (in seconds) of evaluation or approximation of Hessian matrices in dependence of number of variables n for test problems JOS1 and FDS. All tests were solved 100 times. n
JOS1 eval
JOS1 approx
FDS eval
FDS approx
50 100 150 200 300 400 500
0.00016 0.00016 0.00063 0.00107 0.00499 0.00871 0.01347
0.00016 0.00328 0.00764 0.01641 0.06999 0.13413 0.23278
0.00219 0.00484 0.02252 0.06202 0.30439 0.78757 1.57727
0.00127 0.00453 0.01028 0.02137 0.08816 0.18401 0.34102
7. Conclusions We have proven that quasi-Newton’s method for multiobjective optimization converges superlinearly to the solution of the given problem, if all functions involved are twice continuously differentiable and strongly convex. In a neighborhood of such solution, algorithm chooses a full Armijo step. As numerical performance of the quasi-Newton’s method for multiobjective optimization shows, test problems with strongly convex objective functions are efficiently solved. Simple problems with low amount of curvature present no challenge to the method. In this case the method is quite robust with respect to the number of variables and the starting point
Ž. Povalej / Journal of Computational and Applied Mathematics 255 (2014) 765–777
777
chosen. The advantage of this method, compared to Newton’s method for multiobjective optimization, is that the approximation of Hessian matrices is usually reasonably faster than their actual evaluation. This difference is especially noticeable when the dimension of the problem rises. The adaptation of quasi-Newton’s method for multiobjective optimization to other test problems, not considered here, and its hybridization with evolutionary multiobjective optimization methods is a subject of further research. Acknowledgment The operation part was financed by the European Union, European Social Fund — Contract No. P-MR-08/28. References [1] B. Filipič, T. Tušar, E. Laitinen, Preliminary numerical experiments in multiobjective optimization of a metallurgical production process, Informatica 31 (2007) 233–240. [2] M. Hasenjäger, B. Sendhoff, T. Sonoda, T. Arima, Three dimensional evolutionary aerodynamic design optimization using single and multi-objective approaches, in: R. Schilling, W. Haase, J. Periaux, H. Baier, G. Bugeda (Eds.), Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems EUROGEN 2005, FLM, Munich, 2005. [3] S. Jaganathan, S. Palaniswami, G. Maharaja Vignesh, R. Mithunraj, Applications of multi objective optimization to reactive power planning problem using ant colony algorithm, Eur. J. Sci. Res. 51 (2011) 241–253. [4] A. Lipej, Optimization method for the design of axial hydraulic turbines, J. Power. Energ. A 218 (2004) 43–50. [5] F. Petterson, N. Chakraborti, H. Saxén, A genetic algorithms based multiobjective neural net applied to noisy blast furnace data, Appl. Soft Comput. 7 (2007) 387–397. [6] F. Petterson, N. Chakraborti, S.B. Singh, Neural networks analysis of steel plate processing augmented by multi-objective genetic algorithms, Steel. Res. Int. 78 (2007) 890–898. [7] R.E. Steuer, Y. Qi, M. Hirschberger, Portfolio selection in the presence of multiple criteria, in: C. Zopounidis, M. Doumpos, P.M. Pardalos (Eds.), Handbook of Financial Engineering, in: Springer Optimization and Its Applications, vol. 18, Springer, US, 2008, pp. 3–24. [8] M.G.C. Tapia, C.A. Coello Coello, Applications of multi-objective evolutionary algorithms in economics and finance: a survey, in: IEEE Congress on Evolutionary Computation’07, pp. 532–539. [9] K.L. Kiran, D. Jayachandran, S. Lakshminarayanan, Multi-objective optimization of cancer immuno-chemotherapy, in: R. Magjarevic, C.T. Lim, J.C.H. Goh (Eds.), 13th International Conference on Biomedical Engineering, vol. 23, Springer, Berlin, Heidelberg, 2009, pp. 1337–1340. [10] K.L. Kiran, S. Lakshminarayanan, Treatment planning of cancer dendritic cell therapy using multi-objective optimization, in: Proceedings of ADCHEM 2009, Istanbul, Turkey. [11] A. Petrovski, J. McCall, B. Sudha, Multi-objective optimization of cancer chemotherapy using swarm intelligence, in: AISB 2009 Symposium on Adaptive and Emergent Behaviour and Complex Systems. [12] R. Janssen, M. van Herwijnen, T.J. Stewart, J.C.J.H. Aerts, Multiobjective decision support for land-use planning, Environ. Plann. B 35 (2008) 740–756. [13] W. Jiang-Ping, T. Qun, Urban planning decision using multi-objective optimization algorithm, in: ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM’08, 2008. [14] S. Gass, T. Saaty, The computational algorithm for the parametric objective function, Nav. Res. Logist. Q. 2 (1955) 39–45. [15] Y.Y. Haimes, L.S. Lasdon, D.A. Wismer, On a bicriterion formulation of the problems of integrated system identification and system optimization, IEEE T. Syst. Man Cyb. 1 (1971) 296–297. [16] P.C. Fishburn, Lexicographic orders, utilities and decision rules: a survey, Manag. Sci. 20 (1974) 1442–1471. [17] J. Fliege, B.F. Svaiter, Steepest descent method for multicriteria optimization, Math. Method. Oper. Res. 51 (2000) 479–494. [18] L.M. Graña Drummond, A.N. Iusem, A projected gradient method for vector optimization problems, Comput. Optim. Appl. 28 (2004) 5–29. [19] L.M. Graña Drummond, B.F. Svaiter, A steepest descent method for vector optimization, J. Comput. Appl. Math. 175 (2005) 395–414. [20] K. Deb, Wiley-Interscience Series in Systems and Optimization, John Wiley & Sons, Chichester, 2001. [21] J. Branke, K. Deb, K. Miettinen, R. Słowiński (Eds.), Multiobjective Optimization: Interactive and Evolutionary Approaches, in: State-of-the-Art Survey Series of the Lecture Notes in Computer Science, Springer-Verlag, Berlin, 2008. [22] K. Miettinen, Nonlinear multiobjective optimization, in: International Series in Operations Research and Management Science, Kluwer Academic Publishers, Dordrecht, 1999. [23] J. Fliege, L.M. Graña Drummond, B.F. Svaiter, Newton’s method for multiobjective optimization, SIAM J. Optimi. 20 (2009) 602–626. [24] J. Nocedal, S.J. Wright, Numerical Optimization, second ed., Springer Science+Business Media, LLC, New York, 2006. [25] C.G. Broyden, A new double-rank minimization algorithm, Not. Am. Math. Soc. 16 (1969) 670. [26] R. Fletcher, A new approach to variable metric algorithms, Comput. J. 13 (1970) 317–322. [27] D. Goldfarb, A family of variable-metric methods derived by variational means, Math. Comp. 24 (1970) 23–26. [28] D.F. Shanno, Conditioning of quasi-Newton methods for function minimization, Math. Comp. 24 (1970) 647–656. [29] J.E. Dennis, R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, second ed., SIAM, Philadelphia, 1996. [30] J.E. Dennis, J.J. Moré, Quasi-Newton methods, motivation and theory, SIAM Rev. 19 (1977) 46–89. [31] M. Sefrioui, J. Periaux, Nash genetic algorithms: examples and applications, in: Proceedings of the 2000 Congress on Evolutionary Computation CEC00, IEEE Press, 2000, pp. 509–516. [32] Y. Jin, M. Olhofer, B. Sendhoff, Dynamic weighted aggregation for evolutionary multi-objective optimization: why does it work and how? in: Proceedings of the Genetic and Evolutionary Computation Conference GECCO, Morgan Kaufmann, 2001, pp. 1042–1049. [33] M. Preuss, B. Naujoks, G. Rudolph, Pareto set and EMOA behavior for simple multimodal multiobjective functions, in: Parallel Problem Solving from Nature — PPSN IX, Springer, Berlin, 2006, pp. 513–522. [34] E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results, Evol. Comput. 8 (2000) 173–195.