Applied Mathematics and Computation 263 (2015) 47–58
Contents lists available at ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
Line search filter inexact secant methods for nonlinear equality constrained optimization ✩ Zhujun Wang a,∗, Li Cai b,c, Detong Zhu b a
College of Science, Hunan Institute of Engineering, Xiangtan 411104, China Department of Mathematics, Shanghai Normal University, Shanghai 200234, China c Shanghai Research Institute of Microwave Equipment, Shanghai 200063, China b
a r t i c l e
i n f o
MSC: 49M37 90C30 65K05 Keywords: Constrained optimization Filter method Inexact method Secant method Maratos effect
a b s t r a c t We present inexact secant methods in association with line search filter technique for solving nonlinear equality constrained optimization. For large-scale applications, it is expensive to get an exact search direction, and hence we use an inexact method that finds an approximate solution satisfying some appropriate conditions. The global convergence of the proposed algorithm is established by using line search filter technique. The second-order correction step is used to overcome the Maratos effect, while the line search filter inexact secant methods have superlinear local convergence rate. Finally, the results of numerical experiments indicate that the proposed methods are efficient for the given test problems. © 2015 Elsevier Inc. All rights reserved.
1. Introduction We consider the problem
minn f (x) subject to c(x) = 0, x∈R
(1.1)
where f : Rn → R and c : Rn → Rm are twice continuously differentiable. Fletcher and Leyffer [6] proposed a class of filter methods, which do not require any penalty parameter and have promising numerical results. Consequently, filter technique has employed to many approaches, for instance, SLP methods [4], SQP methods [5,12], interior point approaches [13], bundle techniques [9] and so on. In [10,11], Wächter and Biegler presented a line search filter method for nonlinear equality constrained programming and discussed the global convergence properties and superlinear local convergence rate. Secant methods (two-step algorithms) as defined in [7] are one of the most successful methods for solving problem (1.1) and have a main advantage which rests in the use of an orthogonal projection operator which is continuous. The basic idea about the secant algorithms can be summarized as follows: at each iteration, the total step dk is considered as the sum of a horizontal step uk and a vertical step vk . The general form of the secant algorithms follows, in each iteration
λk+1 = U(xk , λk , Wk ),
(1.2a)
✩ The authors gratefully acknowledge the partial supports of the National Science Foundation Grant (11371253) of China and the Science Foundation Grant (11C0336) of Provincial Education Department of Hunan. ∗ Corresponding author. Tel.: +86 21 64323361. E-mail addresses:
[email protected] (Z. Wang),
[email protected] (L. Cai),
[email protected] (D. Zhu).
http://dx.doi.org/10.1016/j.amc.2015.04.016 0096-3003/© 2015 Elsevier Inc. All rights reserved.
48
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
Wk uˆ k = −∇x L(xk , λk+1 ),
(1.2b)
uk = Pk uˆ k ,
(1.2c)
†
vk = −∇ ck ck ,
(1.2d)
yk = ∇x L(xk + uk , λk+1 ) − ∇x L(xk , λk+1 ),
(1.2e)
Wk+1 = B(uk , yk , Wk ),
(1.2f)
xk+1 = xk + uk + vk ,
(1.2g)
†
where ∇ ck is the pseudo-inverse of ∇ ckT , L is the Lagrangian function defined for x ∈ Rn and λ ∈ Rm by L(x, λ) = f (x) + λT c(x), and B is a secant update formula generating matrices Wk approximating the Hessian of the Lagrangian at each xk . The projection onto the null space of c(x)T can be either the orthogonal projection P(x) given by
P (x) = I − ∇ c(x)[∇ c(x)T ∇ c(x)]−1 ∇ c(x)T
(1.3a)
or the oblique projection defined by
P (x) = I − W −1 ∇ c(x)[∇ c(x)T W −1 ∇ c(x)]−1 ∇ c(x)T .
(1.3b)
The multiplier updates U(xk , λk , Wk ) in (1.2a) can be chosen from one of the following updates:
λPk+1 = − ∇ ckT ∇ ck
−1
∇ ckT ∇ fk ,
λSk+1 = − ∇ ckT Wk−1 ∇ ck
λNk+1 = ∇ ckT Wk−1 ∇ ck
−1
−1
(1.4a)
∇ ckT Wk−1 ∇ fk ,
(1.4b)
ck − ∇ ckT Wk−1 ∇ fk ,
(1.4c)
are projection update, null-space update and Newton update, respectively. The pseudo-inverse of ∇ ckT , where λPk+1 , λSk+1 , λN k+1
∇ ck† will be either
∇ ck† = ∇ ck ∇ ckT ∇ ck or
−1
∇ ck† = Wk−1 ∇ ck ∇ ckT Wk−1 ∇ ck
(1.5a)
−1
.
(1.5b)
A drawback of many contemporary SQP algorithms is that they require explicit representations of exact derivative information and the solution of one or more linear systems during every iteration. For large-scale applications, it may be too expensive to get. In [2,3], Byrd et al. present inexact methods and give some conditions that guarantee the global convergence of inexact steps. Motivated by the idea and methods above, we propose an inexact secant method in association with line search filter technique, which has both global convergence and superlinear local convergence rate. The method is globalized by line search and filter approach. It has been noted by Fletcher and Leyffer [6] that the filter approach, similar to a penalty function approach, can suffer from Maratos effect, hence it is necessary to overcome Maratos effect by solving second order correction (SOC) step in our algorithm. In addition, it is shown that the iterative point generated by our algorithm is superlinear convergent. The paper is organized as follows. In Section 2 the algorithm is developed. Section 3 analyzes the global convergence properties. We characterize the local superlinear convergence in Section 4. The results of numerical experience with the method are discussed in Section 5. 2. Algorithm Throughout the paper, we denote the Euclidean vector or matrix norm by · , l1 norm by · 1 , A(x) := c(x), A(x)† := c(x)† and g(x) := f(x). We will compute the horizontal step uk inexactly, which means that (1.2b) will be substituted by
Wk uˆ k = −∇x L(xk , λk+1 ) + rk ,
(2.1)
where rk ηk ck with ηk [0, t] and t (0, 1). We use the norm of ck to control the residual because limk → c(xk ) = 0, which will be shown in Theorem 3.1. Because of adopting l1 penalty function as a merit function, it is necessary to overcome Maratos effect by solving SOC step dˆk in our algorithm. In this work, dˆk can be either
−1 c(xk + dk ) dˆk = −Ak ATk Ak
(2.2a)
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
49
Table 1 Algorithms.
or
†
Algorithm
λk + 1
P(x)
Ak
dˆk
1 2 3 4 5 6
(1.4b) (1.4a) (1.4c) (1.4b) (1.4a) (1.4c)
I (1.3b) (1.3a) I (1.3b) (1.3a)
(1.5b) (1.5b) (1.5a) (1.5a) (1.5a) (1.5b)
(2.2b) (2.2a) (2.2b) (2.2b) (2.2a) (2.2b)
−1 c(xk + dk ). dˆk = −Wk−1 Ak ATk Wk−1 Ak
(2.2b) †
Table 1 presents six related algorithms with different P(x), λk + 1 , Ak and dˆk . In order to obtain the next iteration, we use the line search filter approach to find a step size α k (0, 1]. From the filter idea, we know that a nonlinear programming algorithm must deal with two different criteria, related to optimality and to feasibility, respectively. Optimality is measured by the objective function f; feasibility is typically measured by penalization of constraint violation, for instance, by the function h, given by h(x) = c(x), where · is the ordinary Euclidean norm. We say that a trial point xk is acceptable for the filter if and only if
h(xk + αk,l dk ) ≤ (1 − γh )h(xk )
(2.3a)
f (xk + αk,l dk ) ≤ f (xk ) − γf h(xk ),
(2.3b)
or
where the constants γ h , γ f (0, 1). When (2.3a) and (2.3b) hold, the trial point should satisfy the Armijo-type condition if the f-type switching condition holds
αk,l gkT dk < 0
and
sf −αk,l gkT dk
1−s
αk,l f > τ h(xk )sh ,
(2.4)
where the constants sf 1, sh > 1 and τ > 0. The Armijo-type condition is
f (xk + αk,l dk ) ≤ f (xk ) + βαk,l gkT dk ,
(2.5)
1 2 ).
where β ∈ (0, In this paper, we employ the definition of filter in [10]. The filter is defined as a set Fk ⊆ [0, ∞) × R containing all (h, f)-pairs / Fk , then the trial point is acceptable to the filter. We use the that are prohibited in iteration k. If (h(xk + αk,l dk ), f (xk + αk,l dk )) ∈ updating formula to augment the filter Fk+1 = Fk ∪ {(h, f ) ∈ R2 : h ≥ (1 − γh )h(xk ) and
f ≥ f (xk ) − γf h(xk )},
(2.6)
which can be seen in [6]. We now formally state the global inexact line search filter secant method for solving (1.1). Algorithm A 1. Initialize. Choose starting point x0 and λ0 . Set constants β ∈ (0, 12 ), γ h , γ f (0, 1), τ , ν > 0, sh > 1, sf > 2sh , γ α (0, 1], ς (0, 1) and ε > 0. Choose an initial filter F0 := {(h, f ) ∈ R2 : h ≥ ν} and let k 0. 2. If x l(xk , λk )1 + ck 1 < ε , then stop. 3. Evaluate fk , gk , ck and Ak . Then compute the multiplier λk + 1 = U(xk , λk , Wk ) from (1.4). Find an inexact step dk from
Wk uˆ k = −∇x L(xk , λk+1 ) + rk ,
(2.7a)
uk = Pk uˆ k ,
(2.7b)
†
vk = −Ak ck ,
(2.7c)
dk = uk + vk ,
(2.7d)
satisfying
rk ≤ ηk ck with ηk [0, t] and t ∈ (0, Step 11.
(2.8) 1−γh 1+γh ).
In (2.7), Pk is given by (1.3) and
† Ak
is given by (1.5). If (2.7a) is too ill-conditioned, go to
50
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
4. Set α k, 0 = 1 and l = 0. 5. If αk,l < αkmin with
α
min k
:= γα ·
⎧ ⎨ ⎩
min
γh
sh
τ hk γf h k γh , −g , s T dk (−gT d ) f k
k
k
if gkT dk < 0
(2.9)
otherwise,
go to Step 11. Otherwise, if (h(xk + αk,l dk ), f (xk + αk,l dk )) ∈ Fk , then go to Step 7. 6. If (2.4) holds, α k, l is an f-step size. Then if (2.5) holds, set xk + 1 xk + α k, l dk and go to Step 10. On the other hand, if (2.4) does not hold, then if (2.3) holds, set xk + 1 xk + α k, l dk and go to Step 10. 7. If l 0, go to Step 9. Otherwise, compute the second order correction step dˆk by (2.2). Then if (h(xk + dk + dˆk ), f (xk + dk + dˆk )) ∈ Fk , go to Step 9. 8. If (2.4) holds with xk + dk and f (xk + dk + dˆk ) ≤ fk + β gT dk , holds, then set xk+1 := xk + dk + dˆk and go to Step 10. k
If (2.4) does not hold with xk + dk , then if h(xk + dk + dˆk ) ≤ (1 − γh )h(xk ) or f (xk + dk + dˆk ) ≤ f (xk ) − γf h(xk ) holds, set xk+1 := xk + dk + dˆk and go to Step 10.
9. Set α k, l + 1 = ςα k, l and l ← l + 1. Go to Step 5. 10. Set α k α k, l . If (2.4) does not hold for α k , augment the filter by (2.6); otherwise set Fk+1 := Fk . Go to Step 12. 11. Feasibility restoration phase. Find a new point xk + 1 such that
(a) h(xk+1 ) is smaller than h(xk ); (b) (2.3) holds for xk+1 ; (c) (h(xk+1 ), f (xk+1 )) ∈/ Fk . Augment the filter by (2.6). Go to Step 12. 12. Compute yk = Pk [∇x L(xk + uk , λk+1 ) − ∇x L(xk , λk+1 )], and update Wk by Wk + 1 = BFGS(uk , yk , Wk ). Set k k + 1 and go to Step 2. Remark 1. Backtracking line search technique may cause lim l α k, l = 0. In the case that hk > 0, it can be seen from (2.9) that αkmin > 0. Hence, the algorithm switches to the feasibility restoration phase if αk,l < αkmin in Step 5. In the case that hk = 0, (2.9) implies αkmin = 0. The Armijo condition (2.5) can be satisfied for a sufficiently small step size α k . Remark 2. The restoration phase algorithm converges to a stationary point for the constraint violation, assuming that a suitable method is used. 3. Global convergence ¯ is defined as the We denote R as the set of iterations in which the restoration phase is invoked in the algorithm. The set R set of iterations in which the restoration phase is invoked because the system (2.7) is too ill-conditioned. We denote the set of the iterations in which the filter has been augmented by A. The criticality measure is defined as
χ(xk ) :=
uk 2 k ∈/ R¯ ∞
otherwise.
(3.1)
We begin with the global convergence analysis of the proposed algorithm under the following assumptions. Assumptions G. G1. G2. G3. G4. G5.
f and c are bounded and twice continuously differentiable on Rn . ¯ whenever hk Mh . /R There exists a constant Mh > 0 such that k ∈ There exists a constant κ > 0 such that uT Wk u κu2 for ATk u = 0 and {Wk } are uniformly bounded. Ak have full row rank and their smallest singular values are bounded below by some positive constant. † The sequences {Ak } are uniformly bounded.
These assumptions are fairly standard for a line search method and similar to those made in [10]. From Assumptions G1, G3 and G4, we can easily prove the next lemma. Lemma 3.1. Suppose Assumptions G hold. Then uk , vk and λk + 1 are bounded. Next, we prove that the search direction is a direction of sufficient descent for f at points that are sufficiently close to feasible and nonoptimal. Lemma 3.2. Suppose Assumptions G hold. If there exists a subsequence of iterates {xki } such that χ(xki ) ≥ with ε > 0, then there exist constants ε 1 , ε 2 > 0 satisfying
h(xki ) ≤ 1 ⇒ gkTi dki ≤ − 2 for all i.
(3.2)
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
51
¯ From (2.7c) and Proof. If there exists a subsequence of iterates {xki } such that χ(xki ) ≥ with ε > 0, then by G2, we have ki ∈ / R. Assumptions G4 and G5, we can obtain that
vk = O(ck ).
(3.3)
For Algorithms 1 and 4, we have that
uk = uˆ k = −Wk−1 gk + Wk−1 Ak λk+1 + Wk−1 rk −1 = −Wk−1 gk + Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 gk + Wk−1 rk . Hence
−1 uTk Wk uk = uTk −gk + Ak ATk Wk−1 Ak ATk Wk−1 gk + rk −1 = −uTk gk + gkT Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 rk + uTk rk .
So
−1 gkT uk = −uTk Wk uk + gkT Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 rk + uTk rk .
(3.4)
¯ /R By (2.8), (3.3), (3.4), Assumption G3 and χ(xki ) ≥ , we have that for ki ∈
gkTi dki = gkTi uki + gkTi vki
−1 −1 T A = −uTki Wki uki + gkTi Wk−1 A W A ATk Wk−1 rki + uTki rki + gkTi vki k k i i k k i i i i ≤ −κuki 2 + κ1 uki cki + κ2 cki κ2 ≤ uki −κ + κ1 hki + hki ,
(3.5)
κ for some constants κ 1 , κ 2 > 0. We define 1 := min{Mh , 2(κ
+ κ2 ) }. Then if h(xki ) ≤ 1 , we can obtain, by (3.5), 1 2
gkTi dki ≤ −
κ 2
χ(xki ) ≤ −
2κ 2
.
Hence (3.2) holds with 2 := 2κ . For Algorithms 2 and 5, we have that uˆ k = −Wk−1 gk + Wk−1 Ak (ATk Ak )−1 ATk gk + Wk−1 rk . Then 2
−1 uk = Pk uˆ k = −Wk−1 gk + Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 gk + Pk Wk−1 rk .
with Pk given by (1.3b). Hence
gkT uk = −uTk Wk uk + uTk Wk Pk Wk−1 rk .
(3.6)
¯ /R Similarly, we can get that for ki ∈
gkTi dki = gkTi uki + gkTi vki = −uTki Wki uki + uTki Wki Pki Wk−1 rki + gkTi vki i ≤ −κuki 2 + κˆ 1 uki cki + κˆ 2 cki κˆ 2 ≤ uki −κ + κˆ 1 hki + hki ,
κ for some constants κˆ 1 , κˆ 2 > 0. We can also define 1 := min{Mh , 2(κˆ
+ κˆ 2 ) }. Then if h(xki ) ≤ 1 , we can obtain, by (3.6), 1 2
gkTi dki ≤ −
κ 2
χ(xki ) ≤ −
2κ 2
.
Hence (3.2) holds with 2 := 2κ . For Algorithms 3 and 6, we have that 2
−1 −1 ATk Wk−1 ∇ fk − Wk−1 Ak ATk Wk−1 Ak ck + Wk−1 rk . uˆ k = −Wk−1 gk + Wk−1 Ak ATk Wk−1 Ak
52
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
Then
uk = Pk uˆ k
−1 = I − Ak ATk Ak ATk uˆ k −1 −1 −1 = −Wk−1 gk + Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 gk − Wk−1 Ak ATk Wk−1 Ak ck + Ak ATk Ak ck + Pk Wk−1 rk = uˆ k − Wk−1 rk − vk + Pk Wk−1 rk with Pk given by (1.3a). Hence
uTk Wk uk = uTk Wk uˆ k − uTk rk − uTk Wk vk + uTk Wk Pk Wk−1 rk = −uTk gk − uTk Wk vk + uTk Wk Pk Wk−1 rk . So
gkT uk = −uTk Wk uk − uTk Wk vk + uTk Wk Pk Wk−1 rk .
(3.7)
Similarly, there exist constants ε 1 and ε 2 such that (3.2) holds. We note that the result (3.2) is precisely that established in Lemma 2 of [10]. Therefore, under Assumptions G we can apply the same analysis as in [10] to obtain the following global convergence results. Theorem 3.1. Suppose Assumptions G hold. Then
lim h(xk ) = 0 and lim inf χ (xk ) = 0.
k→∞
k→∞
4. Local convergence Now we turn to establish a superlinear convergence for our algorithm. As the filter approach will suffer from the Maratos effect, hence it is necessary to overcome Maratos effect by solving second order correction step dˆk in our algorithm. Assume that the sequence {xk } generated by the proposed algorithm converges to a local solution x∗ , we give the following assumption, which is used in the local convergence analysis. Assumptions L. The sequences {xk }, {λk } generated by Algorithm are contained in a convex set and the following properties hold: f and c are twice continuously differentiable in a neighborhood of x∗ . ¯ whenever hk Mh . /R There exists a constant Mh > 0 such that k ∈ A(x∗ ) is full row rank. Wk is uniformly positive definite on the null space of ATk , and W∗ is positive definite on the null space of A(x∗ )T . The sequence {Wk } is uniformly bounded. L5. Wk satisfies that L1. L2. L3. L4.
2 [Wk − ∇xx L(xk , λk+1 )]dk = 0. d k k→∞
lim
(4.1)
Although the proof of the following results are essentially similar to those presented in [11], we include it here for completeness. We define an exact penalty function
φω (x) := f (x) + ωh(x),
(4.2)
and use the penalty function in local convergence analysis. The local linearized approximation of the penalty function is
qω (xk , d) = f (xk ) + gkT d +
1 T d Wk d + ω ATk dk + c(xk ) . 2
(4.3)
Lemma 4.1. Suppose Assumptions L hold. Choose ηk = O(dk ), then the correction vector dˆk given in (2.2) satisfies
dˆk = O(dk 2 ) and c(xk + dk + dˆk ) = o(dk 2 ).
(4.4)
Proof. From (2.7) and ηk = O(dk ), we have that
c(xk + dk ) = ck + ATk dk + O(dk 2 ) = π rk + O(dk 2 ) = O(dk 2 ), where π > 0. Hence, by (2.8), we can obtain that dˆk = O(dk 2 ). Similarly, we can get that
c(xk + dk + dˆk ) = c(xk + dk ) + A(xk + dk )T dˆk + O(dˆk 2 ) = [A(xk + dk ) − A(xk )]T dˆk + O(dˆk 2 ) = o(dˆk ) + O(dˆk 2 ) = o(dk 2 ).
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
53
Lemma 4.2. If Assumptions L hold, then there exists a constant γ˜ω > 0 such that, when ω ≥ γ˜ω ,
qω (xk , 0) − qω (xk , dk ) ≥ 0.
(4.5)
Proof. For Algorithms 1 and 4, (3.4) implies that
gkT dk = gkT uk + gkT vk
−1 = −uTk Wk uk + gkT Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 rk + uTk rk + gkT vk .
(4.6)
From (4.3), we have for π > 0 that qω (xk , 0) − qω (xk , dk ) = −gkT dk − 12 dTk Wk dk + ω(ck − π rk ). Hence
qω (xk , 0) − qω (xk , dk ) −1 1 1 = uTk Wk uk − vTk Wk vk − uTk Wk vk − gkT Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 rk − uTk rk − gkT vk + ω(ck − rk ). 2 2
(4.7)
are all bounded, we can get, by Assumption L4, Since rk = O(ck ), vk = O(ck ) and uk , vk , Wk , λ+ k
−1 1 qω (xk , 0) − qω (xk , dk ) ≥ − vTk Wk vk − uTk Wk vk − gkT Wk−1 Ak ATk Wk−1 Ak ATk Wk−1 rk − uTk rk 2 − gkT vk + ω(ck − rk ) ≥ ω(1 − t)ck − κ3 ck , where κ 3 > 0. Let γ˜ω :=
κ3
1−t .
Then (4.5) holds for ω ≥ γ˜ω . For Algorithms 2 and 5, (3.6) implies that
gkT dk = gkT uk + gkT vk = −uTk Wk uk + uTk Wk Pk Wk−1 rk + gkT vk .
(4.8)
Hence
1 T 1 u Wk uk − vTk Wk vk − uTk Wk vk − uTk Wk Pk Wk−1 rk − gkT vk + ω(ck − rk ) 2 k 2 ≥ ω(1 − t)ck − κˆ 3 ck ,
qω (xk , 0) − qω (xk , dk ) =
(4.9)
κˆ 3
where κˆ 3 > 0. Let γ˜ω := 1−t . Then (4.5) holds for ω ≥ γ˜ω . For Algorithms 3 and 6, (3.7) implies that
gkT dk = gkT uk + gkT vk = −uTk Wk uk − uTk Wk vk + uTk Wk Pk Wk−1 rk + gkT vk .
(4.10)
Hence
qω (xk , 0) − qω (xk , dk ) =
1 T 1 u Wk uk − vTk Wk vk − uTk Wk Pk Wk−1 rk − gkT vk + ω(ck − rk ). 2 k 2
Similarly, we can also prove that there exists a γ˜ω such that (4.5) holds for ω ≥ γ˜ω . Lemma 4.3. Suppose Assumptions L hold. There exists a neighborhood B1 of is accepted by the Armijo condition (2.5) for xk ∈ B1 .
x∗
(4.11)
such that if (2.4) holds for α k, l = 1, then the trial step
Proof. If (2.4) holds, we have
h(xk ) < τ
− s1
h
ssf −gkT dk h .
(4.12)
Since sf > 2sh and gk is uniformly bounded, (4.12) implies that h(xk ) = o(dk 2 ). For Algorithms 1 and 4, (1.4b) and (2.2b) imply that
−1 gkT dˆk = −gkT Wk−1 Ak ATk Wk−1 Ak c(xk + dk ) = λTk c(xk + dk ).
(4.13)
For Algorithms 2 and 5, (1.4a) and (2.2a) imply that
−1 gkT dˆk = −gkT Ak ATk Ak c(xk + dk ) = λTk c(xk + dk ).
(4.14)
For Algorithms 3 and 6, (1.4c) and (2.2c) imply that
−1 gkT dˆk = −gkT Wk−1 Ak ATk Wk−1 Ak c(xk + dk )
λTk c(xk + dk ) + O(ck ) = λTk c(xk + dk ) + o(dk 2 ). =
(4.15)
54
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
Using second order Taylor expansions, (4.13)–(4.15) imply
gkT dˆk =
1 2
λTk c(xk ) + A(xk )T dk + dTk ∇ 2 c(xk )dk + o(dk 2 )
m 1 T 2 = dk λk,i ∇ ci (xk ) dk + o(dk 2 ). 2
(4.16)
i=1
By (4.6), (4.8), (4.10), rk = O(ck ) and vk = O(ck ), we can easily get
gkT dk = −uTk Wk uk + O(ck ) = −uTk Wk uk + o(dk 2 ).
(4.17)
Hence, by Assumption L5, (4.16) and (4.17), we have that
1 (dk + dˆk )T ∇ 2 f (xk )(dk + dˆk ) + β gkT dk + o(dk 2 ) 2 m 1 1 (β − 1)gkT dk − dTk λk,i ∇ 2 ci (xk ) dk − dTk ∇ 2 f (xk )dk + o(dk 2 ) 2 2 i=1 1 1 T − β uk Wk uk − uTk Wk vk − vTk Wk vk + o(dk 2 ) 2 2 1 T − β uk Wk uk + O(ck ) + o(dk 2 ) 2 1 − β uTk Wk uk + o(dk 2 ). 2
f (xk ) − f (xk + dk + dˆk ) + β gkT dk = −gkT (dk + dˆk ) − (4.16)
=
(4.17)
= = =
(4.18)
From Assumption L4, β ∈ (0, 12 ) and (4.18), we can conclude that the Armijo condition (2.5) holds for k sufficiently large. Lemma 4.4. Suppose Assumptions L hold. There exist a neighborhood B2 ⊆ B1 and a constant γ ω > 0 such that for ω γ ω and xk ∈ B 2 ,
φω (xk ) − φω (xk + dk + dˆk ) ≥
1 + γh [qω (xk , 0) − qω (xk , dk )] ≥ 0. 2
(4.19)
Proof. From (4.2), (4.16) and Lemma 4.1, we have that
φω (xk ) − φω (xk + dk + dˆk ) = fk + ωck − f (xk + dk + dˆk ) − ωc(xk + dk + dˆk ) 1 2 m
=
ωck − gkT (dk + dˆk ) − (dk + dˆk )T ∇ 2 f (xk )(dk + dˆk ) + o(dk 2 )
=
ωck −
= (4.17)
= =
gkT dk
1 − dTk 2
1 2
λk,i ∇ ci (xk ) dk − dTk ∇ 2 f (xk )dk + o(dk 2 ) 2
i=1
1 ωck − gkT dk − dTk Wk dk + o(dk 2 ) 2 1 T ωck + uk Wk uk − dTk Wk dk + O(ck ) + o(dk 2 ) 2 1 T ωck + uk Wk uk + O(ck ) + o(dk 2 ). 2
(4.20)
We have, from (3.3), (4.7), (4.9) and (4.11), that
qω (xk , 0) − qω (xk , dk ) = ω(ck − rk ) +
1 T u Wk uk + O(ck ). 2 k
Hence, from Assumption L4, (4.19), γ h (0, 1) and rk = O(ck ), we obtain that
1 + γh [qω (xk , 0) − qω (xk , dk )] 2
1 1 + γh 1 ω(ck − rk ) + uTk Wk uk + O(ck ) + o(dk 2 ) = ωck + uTk Wk uk − 2 2 2 1 − γh 1 − γh T uk Wk uk + O(ck ) + o(dk 2 ) = ωck + 2 4 1 − γh ≥ ωck − κ4 ck , 2
φω (xk ) − φω (xk + dk + dˆk ) −
which means for ω ≥
2κ4 1−γh ,
φω (xk ) − φω (xk + dk + dˆk ) ≥ Lemma 4.2, we conclude that (4.19) holds for ω γ ω .
1+γh 2 [qω (xk , 0) − qω (xk , dk )].
Define γω := max{γ˜ω ,
2κ4 1−γh }.
From
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
55
Table 2 Key for Table 3. Symbol
Meaning
Symbol
Meaning
Name n m ni nf
Name of problem Number of variable Number of constraint Number of iterations Number of calculations of f -
nc na nfg ncg s
Number of calculations of c Number of augment of the filter Number of gradient evaluations of f Number of gradient evaluations of c Total number of outer iterations derived from Algorithm INS in [3]
From [11], we know that the next three choices of step are also second order correction steps to dk , and Lemma 4.4 is true for each of them.
d¯ k = σk dˆk + dk+1 + σk+1 dˆk+1 , d¯ k = σk dˆk + dk+1 + σk+1 dˆk+1 + dk+2 + σk+2 dˆk+2 , or d¯ k = σk dˆk + dk+1 + σk+1 dˆk+1 + dk+2 + σk+2 dˆk+2 + dk+3 + σk+3 dˆk+3 , where σ k , σ k + 1 , σ k + 2 , σ k + 3 {0, 1}. Lemma 4.5. Suppose Assumptions L hold. There exist some constants ω1 , ω2 , ω3 > 0 with
2γh ω2 < (1 − t)(1 + γh )(ω2 − ω1 ) − 2γf ,
(4.21a)
2ω3 ≥ (1 − t)(1 + γh )ω1 + [(1 − γh ) + t (1 + γh )]ω2 ,
(4.21b)
such that ω1 , ω2 , ω3 > γ ω , where γ ω is defined in Lemma 4.4. (1−t)(1+γ )ω +3γ
1 f Proof. We can find a constant ω1 such that ω1 > γ ω . Next, we define ω2 := (1−t)(1+hγ )−2 γh , and h
ω3 := (1 − γh )ω2 − γf , where t ∈ (0,
1−γh 1+γh ).
(4.22)
It is easy to verify that ω2 , ω3 ω1 > γ ω and (4.21a) hold.
/ FK1 . Set K2 We define the level set L = {x ∈ B2 : φω3 (x) ≤ φω3 (x∗ ) + δ}. Choose δ > 0 such that for x L, we have (h(x), f (x)) ∈ as the first iteration satisfying K2 K1 and xK2 ∈ L. For k N, we define the filter building blocks Gk = {(h, f ) : h ≥ (1 − γh )h(xk ) and
f ≥ f (xk ) − γf h(xk )}.
k
Define the index sets Ik2 = {l = k1 , . . ., k2 − 1 : l ∈ A} for k1 k2 . Hence, we have that 1
F k2 = F k1 ∪
Gl .
(4.23)
k
l∈Ik2 1
Lemma 4.6. Suppose Assumptions L hold. When l K1 with h(xl ) > 0, we have that 1+γh / Gl . 2 [qω2 (xl , 0) − qω2 (xl , dl )], then (h(x), f (x)) ∈ φω2 (xK2 ) − φω2 (x) ≥ 1+2γh [qω2 (xK2 , 0) − qω2 (xK2 , dK2 )], then
(i) If φω2 (xl ) − φω2 (x) ≥ (ii) If x ∈ L and
Proof. Firstly, we show that (i) is true. When φω2 (xl ) − φω2 (x) ≥
(h(x), f (x)) ∈/ FK2 .
1+γh 2 [qω2 (xl , 0) − qω2 (xl , dl )],
by (2.8) and (4.3), we can get
1 + γh [qω2 (xl , 0) − qω2 (xl , dl )] 2
1 + γh [qω1 (xl , 0) − qω1 (xl , dl )] + (ω2 − ω1 ) h(xl ) − ATl dl + cl = 2 1 + γh ≥ (ω2 − ω1 )[h(xl ) − rl ] 2 (1 − t)(1 + γh ) (ω2 − ω1 )h(xl ). ≥ 2
φω2 (xl ) − φω2 (x) ≥
(4.24)
56
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58 Table 3 Experimental results of Algorithms 1 and 2. Name
n
bt2 bt3 bt6 bt10 bt11 bt12 fccu genhs28 hs008 hs027 hs028 hs040 hs046 hs048 hs049 hs050 hs051 hs052 hs077 hs078 hs079 maratos mwright
3 5 5 2 5 5 19 10 2 3 3 4 5 5 5 5 5 5 5 5 5 2 5
m
1 3 2 2 3 3 8 8 2 1 1 3 2 2 2 3 3 3 2 3 3 1 3
s
18 1 12 13 9 9 12 4 5 14 1 6 110 1 41 8 2 1 11 5 12 4 7
Algorithm 1
Algorithm 2
ni
nf
nfg
nc
ncg
na
ni
nf
nfg
nc
ncg
na
6 5 24 6 5 11 17 5 11 33 22 7 10 8 22 7 13 8 46 3 3 7 6
10 6 43 9 6 14 21 6 14 40 32 8 16 11 26 8 15 23 118 4 11 16 7
7 6 25 7 6 12 18 6 12 34 23 8 11 9 23 8 14 9 47 4 4 8 7
12 6 51 10 6 15 22 6 15 42 34 8 17 12 27 8 16 25 131 4 12 17 7
7 6 25 7 6 12 18 6 12 34 23 8 11 9 23 8 14 9 47 4 4 8 7
6 0 1 0 5 8 6 3 0 0 0 0 5 5 21 7 12 8 0 0 1 7 6
6 6 23 7 5 11 17 8 11 33 23 7 11 9 25 7 15 8 54 3 3 7 6
10 7 31 10 6 14 21 9 14 40 33 8 17 14 29 8 19 23 120 4 4 16 7
7 7 24 8 6 12 18 9 12 34 24 8 12 10 26 8 16 9 55 4 4 8 7
12 7 33 11 6 15 22 9 15 42 35 8 18 16 30 8 20 25 133 4 4 17 7
7 7 24 8 6 12 18 9 12 34 24 8 12 10 26 8 16 9 55 4 4 8 7
6 0 0 0 5 8 6 6 0 0 0 0 5 5 23 7 14 8 0 0 0 7 6
If f(x) < f(xl ) − γ f h(xl ), then (h(x), f (x)) ∈ / Gl . Otherwise, we have f(x) f(xl ) − γ f h(xl ). From (4.2), (4.21a), (4.24) and h(xl ) > 0, we can obtain that
(1 − t)(1 + γh ) (ω2 − ω1 )h(xl ) + 2 ω2 (1 − t)(1 + γh ) ≥ (ω2 − ω1 )h(xl ) − 2 ω2
h(xl ) − h(x) ≥
1
[f (x) − f (xl )] ω2 γf h(x ) > γh h(xl ), ω2 l
(4.25)
/ Gl . Hence, (i) holds. which implies (h(x), f (x)) ∈ K / FK1 . From (4.23), we only need to show that for all l ∈ IK2 , Next, we show that (ii) is true. Since x L, we have (h(x), f (x)) ∈ 1 (h(x), f (x)) ∈/ Gl . By (2.8) and (4.3), we can get
1 + γh [qω1 (xK2 , 0) − qω1 (xK2 , dK2 )] + (ω2 − ω1 )h(xK2 ) 2 1 + γh ≥ (ω2 − ω1 )[h(xK2 ) − rK2 ] 2 (1 − t)(1 + γh ) ≥ (ω2 − ω1 )h(xK2 ). 2
φω2 (xK2 ) − φω2 (x) ≥
(4.26)
From (4.2), (4.21b), (4.26) and the definition of K2 and l < K2 , we can obtain that
φω3 (xl ) > φω3 (xK2 ) = φω2 (xK2 ) + (ω3 − ω2 )h(xK2 )
(1 − t)(1 + γh ) (1 − γh ) + t (1 + γh ) ≥ φω2 (x) + ω3 − ω1 − ω2 h(xK2 ) 2
≥
φω2 (x).
2
(4.27)
If f(x) < f(xl ) − γ f h(xl ), then (h(x), f (x)) ∈ / Gl . Otherwise, we have f(x) f(xl ) − γ f h(xl ). This, along with (4.2), (4.22) and (4.27), implies that
h(x) = < =
1
ω2 1
ω2 1
ω2
[φω2 (x) − f (x)] [φω3 (xl ) − f (x)] [f (xl ) + ω3 h(xl ) − f (x)]
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
57
Table 4 Experimental results of Algorithms 3 and 4. Name
bt2 bt3 bt6 bt10 bt11 bt12 fccu genhs28 hs008 hs027 hs028 hs040 hs046 hs048 hs049 hs050 hs051 hs052 hs077 hs078 hs079 maratos mwright
n
m
3 5 5 2 5 5 19 10 2 3 3 4 5 5 5 5 5 5 5 5 5 2 5
s
1 3 2 2 3 3 8 8 2 1 1 3 2 2 2 3 3 3 2 3 3 1 3
Algorithm 3
18 1 12 13 9 9 12 4 5 14 1 6 110 1 41 8 2 1 11 5 12 4 7
Algorithm 4
ni
s
nf
nfg
nc
ncg
na
ni
nf
nfg
nc
ncg
6 6 52 7 5 10 11 33 23 7 12 9 25 8 15 8 54 3 3 7 17 8 6
10 7 65 10 6 13 14 40 33 8 18 14 29 9 17 23 120 4 4 16 21 9 7
7 7 53 8 6 11 12 34 24 8 13 10 26 9 16 9 55 4 4 8 18 9 7
12 7 68 11 6 14 15 42 35 8 19 16 30 9 18 25 133 4 4 17 22 9 7
7 7 53 8 6 11 12 34 24 8 13 10 26 9 16 9 55 4 4 8 18 9 7
6 0 0 0 5 7 0 0 0 0 6 5 23 8 14 8 0 0 0 7 6 6 6
6 5 26 6 5 9 11 33 22 7 7 8 22 7 13 8 59 3 3 7 17 5 7
10 6 37 9 6 12 14 40 32 8 13 11 26 8 14 23 130 4 11 16 21 6 8
7 6 27 7 6 10 12 34 23 8 8 9 23 8 14 9 60 4 4 8 18 6 8
12 6 39 10 6 13 15 42 34 8 14 12 27 8 14 25 142 4 12 17 22 6 8
7 6 27 7 6 10 12 34 23 8 8 9 23 8 14 9 60 4 4 8 18 6 8
6 0 1 0 5 6 0 0 0 0 4 5 21 7 12 8 0 0 1 7 6 3 7
na
Table 5 Experimental results of Algorithms 5 and 6. Name
bt2 bt3 bt6 bt10 bt11 bt12 fccu genhs28 hs008 hs027 hs028 hs040 hs046 hs048 hs049 hs050 hs051 hs052 hs077 hs078 hs079 maratos mwright
n
3 5 5 2 5 5 19 10 2 3 3 4 5 5 5 5 5 5 5 5 5 2 5
m
1 3 2 2 3 3 8 8 2 1 1 3 2 2 2 3 3 3 2 3 3 1 3
s
18 1 12 13 9 9 12 4 5 14 1 6 110 1 41 8 2 1 11 5 12 4 7
Algorithm 5
Algorithm 6
ni
nf
nfg
nc
ncg
na
ni
nf
nfg
nc
ncg
na
6 6 80 7 5 9 11 33 23 7 11 9 23 7 16 8 54 3 3 7 18 8 6
10 7 207 10 6 12 14 40 33 8 17 14 27 8 23 23 120 4 4 16 23 9 7
7 7 81 8 6 10 12 34 24 8 12 10 24 8 17 9 55 4 4 8 19 9 7
12 7 232 11 6 13 15 42 35 8 18 16 28 8 24 25 133 4 4 17 25 9 7
7 7 81 8 6 10 12 34 24 8 12 10 24 8 17 9 55 4 4 8 19 9 7
6 0 5 0 5 6 0 0 0 0 6 5 21 7 15 8 0 0 0 7 7 6 6
6 6 32 7 6 11 11 33 23 7 12 9 23 8 12 8 54 3 3 7 17 8 6
10 7 58 10 7 14 14 40 33 8 18 14 27 9 13 23 120 4 4 16 21 9 7
7 7 33 8 7 12 12 34 24 8 13 10 24 9 13 9 55 4 4 8 18 9 7
12 7 63 11 7 15 15 42 35 8 19 16 28 9 13 25 133 4 4 17 22 9 7
7 7 33 8 7 12 12 34 24 8 13 10 24 9 13 9 55 4 4 8 18 9 7
6 0 2 0 6 8 0 0 0 0 6 5 21 8 11 8 0 0 0 7 6 6 6
ω3 + γf h(xl ) ω2 = (1 − γh )h(xl ), ≤
which implies (h(x), f (x)) ∈ / Gl . Hence, (ii) holds. Now, we can get the main local convergence theorem which can be proved in a way similar to Theorem 4.7 in [11]. Theorem 4.1. Suppose Assumptions L hold. For k sufficiently large, the sequence {xk } converges to x∗ with a superlinear rate.
(4.28)
58
Z. Wang et al. / Applied Mathematics and Computation 263 (2015) 47–58
5. Numerical results Numerical experiments on the method given in this paper have been performed on a personal computer with Intel Pentium Dual CPU (2.00 GHz). In this section, we present the numerical results of the proposed algorithm and give comparison with the method considered in [3]. In [3], Byrd et al. proposed an Inexact Newton line search algorithm (Algorithm INS) without filter technique for large-scale equality constrained optimization. Obviously, the two methods are essentially comparable. The numerical results have been obtained by running our algorithm on the set of 23 equality constrained problems. These problems, as others from Table 3 in [3] (those labeled with a (∗ )), were taken from the CUTEr [1,8] collection. The starting point supplied with the problem was used. All codes were written in Matlab code. All attempts to solve the test problems were limited to a maximum of 1000 iterations. The computation terminates when the following stopping criteria is satisfied ∇x L(xk , λk )1 + ck 1 ≤ 10−5 . We set the parameters in the algorithm as follows: β = 0.3, γ h = 0.01, γ f = 0.5, τ = 1, ν = 104 , sh = 1.5, sf = 3.2, γ α = 0.5 and ς = 0.5, which seemed to work reasonably well for a broad class of problems. Detailed numerical results can be found in Tables 3–5, where the headers are described in Table 2. For comparison, we list the iterative number of Algorithm INS [3], which is devoted by s. We can see from these tables that the run for each problem terminated successfully and Algorithm A is feasible and effective. In most cases, the number of iterations and function evaluations are few and it is competitive when compared with Algorithm INS [3]. It is particularly comforting to view the results as one can see that Algorithm A approximates a fast Newton method for the same equality constrained problems. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
I. Bongarta, A.R. Conn, N.I.M. Gould, Ph.L. Toint, CUTE: Constrained and unconstrained testing environment, ACM Trans. Math. Softw. 21 (1995) 123–160. R.H. Byrd, F.E. Curtis, J. Nocedal, An inexact SQP method for equality constrained optimization, SIAM J. Optim. 19 (2008) 351–369. R.H. Byrd, F.E. Curtis, J. Nocedal, An inexact newton method for nonconvex equality constrained optimization, Math. Program., Ser. A 122 (2010) 273–299. C.M. Chin, R. Fletcher, On the global convergence of an SLP-filter algorithm takes EQP steps, Math. Program. 96 (2003) 161–177. R. Fletcher, N.I.M. Gould, S. Leyffer, P.L. Toint, A. Wächter, A global convergence of a trust region SQP-filter algorithm for general nonlinear programming, SIAM J. Optim. 13 (2002) 635–660. R. Fletcher, S. Leyffer, Nonlinear programming without a penalty function, Math. Program. 91 (2002) 239–269. R. Fontecilla, Local convergence of secant methods for nonlinear constrained optimization, SIAM J. Numer. Anal. 25 (1988) 692–712. N.I.M. Gould, D. Orban, Ph. l. Toint, CUTEr and sifdec: A constrained and unconstrained testing environment, revisited, ACM Trans. Math. Softw. 29 (2003) 373–394. E. Karas, A. Ribeiro, C. Sogastizábal, M. Solodov, A bundle-filter method for nonsmooth convex constrained optimization, Math. Program. 116 (2009) 297–320. A. Wächter, L.T. Biegler, Line search filter methods for nonlinear programming: Motivation and global convergence, SIAM J. Comput. 16 (2005) 1–31. A. Wächter, L.T. Biegler, Line search filter methods for nonlinear programming: Local convergence, SIAM J. Optim. 6 (2005) 32–48. X. Zhu, D. Pu, A restoration-free filter SQP algorithm for equality constrained optimization, Appl. Math. Comput. 219 (2013) 6016–6029. Z. Wang, D. Zhu, Projected affine-scaling interior-point Newton’s method with line search filter for box constrained optimization, Appl. Math. Comput. 230 (2014) 484–495.