Full Nesterov–Todd step infeasible interior-point method for symmetric optimization

Full Nesterov–Todd step infeasible interior-point method for symmetric optimization

European Journal of Operational Research 214 (2011) 473–484 Contents lists available at ScienceDirect European Journal of Operational Research journ...

336KB Sizes 0 Downloads 10 Views

European Journal of Operational Research 214 (2011) 473–484

Contents lists available at ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Continuous Optimization

Full Nesterov–Todd step infeasible interior-point method for symmetric optimization G. Gu a,⇑, M. Zangiabadi b, C. Roos c a

Department of Mathematics, Nanjing Uinversity, China Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord, Iran c Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, The Netherlands b

a r t i c l e

i n f o

Article history: Received 4 August 2009 Accepted 18 February 2011 Available online 24 February 2011 Keywords: Interior point methods Conic programming Nesterov–Todd step

a b s t r a c t Euclidean Jordan algebras were proved more than a decade ago to be an indispensable tool in the unified study of interior-point methods. By using it, we generalize the full-Newton step infeasible interior-point method for linear optimization of Roos [Roos, C., 2006. A full-Newton step O(n) infeasible interior-point algorithm for linear optimization. SIAM Journal on Optimization. 16 (4), 1110–1136 (electronic)] to symmetric optimization. This unifies the analysis for linear, second-order cone and semidefinite optimizations. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction During the last two decades, major developments in convex optimization were focusing on Conic Optimization (CO), which optimizes a linear objective function subject to linear constraints and over a pointed, closed, convex cone. The foundation for solving these problems using interior-point methods (IPMs) was laid by Nesterov and Nemirovskii (1994). Their methods were primarily either primal or dual based. Later, Nesterov and Todd (1997, 1998) provided a theoretical foundation of efficient primal–dual IPMs on a special class of CO, where the associated cone is selfscaled. Güler (1996) observed that the self-scaled cones are precisely symmetric cones, which have been much studied and even characterized. See, for example, the comprehensive book of Faraut and Korányi (1994). This special subclass of CO is named as symmetric optimization (SO), which includes linear optimization, second-order cone optimization, and semidefinite optimization as special cases. Faybusovich is the first who has exploited the advantages given by the Jordan algebraic setting in the study of SO. He started his in-depth research by a study of non-degeneracy conditions for SO in Faybusovich (1997b). Subsequently, he analyzed various interior-point strategies for SO in Faybusovich (1997a, 2002), where Jordan algebras played a crucial role. The ideas of Faybusovich have been followed by many others. For instance, Sturm (2000) presented the theoretical basis of his SeDuMi software in terms of Jordan algebras. Later, Schmieta and Alizadeh ⇑ Corresponding author. Tel.: +86 13851895318. E-mail addresses: [email protected] (G. Gu), [email protected](M. Zangiabadi), [email protected] (C. Roos). 0377-2217/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2011.02.022

(2001, 2003), and Rangarajan (2006) used this setting in further studies of IPMs for SO. Recently, Roos (2006) proposed a new Infeasible IPM (IIPM) for LO. It differs from the classical IIPMs (e.g. Kojima et al., 1993; Lustig, 1990/91. ; Mizuno, 1994; Potra, 1996; Zhang, 1994) in that the new method uses only full steps (instead of damped steps), which has the advantage that no line searches are needed. The motivation for the use of full steps is that, though such methods are less greedy, the best complexity results for IPMs are obtained for such methods. In this paper, we generalize Roos’s full-Newton step IIPM for LO to full Nesterov–Todd step (NT-step) IIPM for SO by using Jordan algebras. The paper is organized as follows. In Section 2 we briefly recall or derive some properties of symmetric cones and its associated Euclidean Jordan algebras, focussing on what is needed in the analysis later on. In Section 3 we recall the notions of central path and NT-step, where a unified proof of the quadratic convergence is given in the framework of Euclidean Jordan algebras. Then, in Section 4 we present our full NT-step IIPM for SO. The iteration bound we get coincides with the best known iteration bound for IIPMs. Finally, Section 5 contains some conclusions and topics for further research.

2. Euclidean Jordan algebras This section offers a brief introduction to the theory of Euclidean Jordan algebras and symmetric cones. We restrict ourselves to finite-dimensional algebras over the real field R. For omitted

474

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

proofs and more details, we refer to the given references and also to Faraut and Korányi (1994), and Vieira (2007). Definition 2.1 (Euclidean Jordan algebra). Let J be a finite-dimensional R-algebra along with a bilinear map  : J  J #J . Then ðJ ; Þ is called a Euclidean Jordan algebra if for all x; y 2 J the following holds: (i) x  y = y  x (Commutativity); (ii) x  (x2  y) = x2  (x  y), where x2 = x  x (Jordan’s Axiom); (iii) there exists an inner product such that hx  y, zi = hx, y  zi. We always assume that there exists an identity element e such that e  x = x  e = x for all x 2 J . For an element x 2 J , let LðxÞ : J #J be the linear map defined by L(x)y :¼ x  y, for all y 2 J . Moreover, we define P(x) = 2L(x)2  L(x2), where L(x)2 = L(x)L(x). The map P() is called the quadratic representation of J . Below we list several important properties of quadratic representation. Proposition 2.2 (cf. Faraut and Korányi, 1994, Proposition II.3.1). An element x 2 J is invertible if and only if P(x) is invertible. In this case, P(x)x1 = x and P(x)1 = P(x1). Proposition 2.3 (cf. II.3.3). One has:

Faraut

and

Korányi,

1994,

Proposition

(i) The differential of the map x ´ x1 is P(x)1. (ii) If x and y are invertible, then P(x)y is invertible and

ðPðxÞyÞ1 ¼ Pðx1 Þy1 : (iii) For any two elements x and y : P(P(y)x) = P(y)P(x)P(y). In particular, the equation in (iii) of the above proposition is known as the fundamental formula. For x 2 J , let r be the smallest integer such that e, x, x2, . . . , xr are linearly dependent. Then r is defined as the degree of the element x, denoted as deg (x). We define the rank of J as r :¼ maxfdegðxÞ : x 2 J g. An element c 2 J is said to be an idempotent if c2 = c. Two idempotents c1 and c2 are said to be orthogonal if c1  c2 = 0. Moreover, an idempotent is primitive if it is non-zero and cannot be written as the sum of two (necessarily orthogonal) non-zero idempotents. We say that {c1, . . . , cr} is a Jordan frame, if each ci is a Pr primitive idempotent, ci  cj = 0 (i – j) and i¼1 ci ¼ e. Jordan frames play a crucial role in characterizing the elements of Euclidean Jordan algebras. The result is known as ‘‘spectral theorem’’, which is recalled below. Theorem 2.4 (Spectral theorem, cf. Faraut and Korányi, 1994, Theorem III.1.2). Suppose J has rank r. Then for x in J there exist a Jordan frame c1, c2, . . . , cr and real numbers k1, k2, . . . , kr such that P x ¼ ri¼1 ki ci . The numbers ki (with their multiplicities) are uniquely determined by x. We call the above k1, . . . , kr the eigenvalues of x. To express their dependence on x, we denote them as k1(x), . . . , kr(x), or simply as a vector k(x) 2 Rr. Moreover, we denote the largest eigenvalue of x as kmax(x), and analogously the smallest one as kmin(x). Based on the P eigenvalues, we define trace and determinant as trðxÞ ¼ ri ki and Qr detðxÞ ¼ i¼1 ki , respectively. By using eigenvalues, we may extend the definition of any real valued, continuous univariate function to elements of a Euclidean Jordan algebra. Particularly, we have: 1 Inverse: x1 :¼ k1 1 c1 þ    þ kr ck , whenever all ki – 0 and undefined otherwise;

1=2

Square root: x1=2 :¼ k1 c1 þ    þ k1=2 r c k , whenever all ki P 0 and undefined otherwise; Square: x2 :¼ k21 c1 þ    þ k2r ck . The norm induced by the inner product is named as the Frobepffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffi nius norm, which is given by kxkF :¼ hx; xi ¼ trðx2 Þ. The above norm can also be obtained via eigenvalues. By Theorem 2.4, P x 2 J has a spectral decomposition x ¼ ri¼1 ki ðxÞci , and x2 ¼ pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pr 2 ffi Pr 2 Hence, kxkF ¼ hx; xi ¼ trðx2 Þ ¼ i¼1 ki ðxÞc i . i¼1 ki ðxÞ ¼ kkðxÞk. Note that since e has eigenvalue 1, with multiplicity r, it folpffiffiffi lows that tr (e) = r, det (e) = 1 and kekF ¼ r. We define the cone of squares of J as KðJ Þ :¼ fx2 : x 2 J g. The following theorem brings together some major properties of the cone of squares KðJ Þ. Theorem 2.5 (cf. Faraut and Korányi, 1994, Theorem III.2.1, Proposition III.2.2). Let J be a Euclidean Jordan algebra, then KðJ Þ is a symmetric cone, and is the set of elements x in J for which L(x) is positive semidefinite. Furthermore, if x is invertible, then PðxÞintKðJ Þ ¼ intKðJ Þ. A convex cone is said to be symmetric if it is self-dual and homogeneous. Theorem 2.6 (cf. Faraut and Korányi, 1994, Section III.2–5). A cone is symmetric if and only if it is the cone of squares of some Euclidean Jordan algebra. In the sequel, K will always denote a symmetric cone, and J a Euclidean Jordan algebra for which K is its cone of squares. The next proposition contains a result of crucial importance in the design of IPMs within the framework of Jordan algebras. Proposition 2.7 (cf. Faybusovich, 1997b, Lemma 2.2). Let x; s 2 K. Then tr (x  s) P 0. We have tr (x  s) = 0 if and only if x  s = 0. Next, we recall the existence and uniqueness of a scaling point w corresponding to any points x; s 2 intK, such that P(w) takes s into x. This was done by Nesterov and Todd (1997, 1998) for selfscaled cones. Later, Faybusovich (2002) derived it in the framework of Euclidean Jordan algebras. Proposition 2.8 (NT-scaling, cf. Faybusovich, 2002, Lemma 3.2). Given x; s 2 intK, there exists a unique w 2 intK such that x = P(w)s. Moreover,

w :¼ PðxÞ1=2 ðPðxÞ1=2 sÞ1=2

h

i ¼ PðsÞ1=2 ðPðsÞ1=2 xÞ1=2 ;

ð2:1Þ

and we call w the scaling point of x and s (in this order). We say that two elements x and s in J are similar, denoted as x  s, if x and s share the same set of eigenvalues. In what follows, we list some results regarding similarity. Lemma 2.9 Schmieta and Alizadeh, 2003, Proposition 21. Let x; s; u 2 intK. Defining ~ x ¼ PðuÞx and ~s ¼ Pðu1 Þs, one has Pð~ x1=2 Þ~sðx1=2 Þs. Lemma 2.10 (cf. Vieira, 2007, Proposition 3.2.4). Let x; s 2 intK, and w the scaling point of x and s, then (P(x1/2)s)1/2  P(w1/2)s. To analyze our IIPM algorithm, we need some inequalities, which are recalled or derived in the sequel. Lemma 2.11. Let x 2 K, then tr (x2) 6 tr (x)2. Proof. Since x 2 K, we have k(x) P 0. Therefore

475

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

trðx2 Þ ¼

r X

ki ðxÞ2 6

i¼1

r X

!2 ki ðxÞ

¼ trðxÞ2 :

Lemma 2.16. Let x; s 2 J , then kx  skF 6 12 kx2 þ s2 kF .

i¼1

This proves the lemma. h Lemma 2.12. Let x 2 J , then kx2 kF 6 kxk2F .

Proof. Since

x2 þ s2 þ 2x  s ¼ ðx þ sÞ2 2 K; x2 þ s2  2x  s ¼ ðx  sÞ2 2 K;

Proof. Using the definition of Frobenius norm, we have

we have hx2 + s2 + 2x  s, x2 + s2  2x  si P 0, which is equivalent to

 2 2 x  ¼ trððx2 Þ2 Þ 6 trðx2 Þ2 ¼ ðkxk2 Þ2 ; F F

hx2 þ s2 ; x2 þ s2 i  4hx  s; x  si P 0;

where the inequality follows from Lemma 2.11. Since both kx2kF and kxkF are nonnegative, the lemma follows. h

or, equivalently, lemma. h

Lemma 2.13. Let J be a Euclidean Jordan algebra, and x; s 2 J with hx, si = 0. Then one has (i)  14 kx þ sk2F eK x  sK 14 kx þ sk2F e; (ii) kx  skF 6 2p1 ffiffi2 kx þ sk2F .

Proof. We write x  s = 14 ((x + s)2  (x  s)2). Since ðx þ sÞ2 2 K, we have

Using ðx  sÞ2 K kmax ððx  sÞ2 ÞeK kx  sk2F e, it follows that x  s þ 14 kx  sk2F e 2 K, which means that  14 kx  sk2F eK x  s. In a similar way one derives that x  sK 14 kx þ sk2F e. Since hx, si = 0, it follows that kx  skF = kx + skF, and hence, part (i) of the lemma follows. For the proof of part (ii), we observe that

kx 

 2  1  1  2 2  2 2 2 ¼ 4 ððx þ sÞ  ðx  sÞ Þ ¼ 16 tr ððx þ sÞ  ðx  sÞ Þ F i 1 h 4 trððx þ sÞ Þ þ trððx  sÞ4 Þ  2trððx þ sÞ2  ðx  sÞ2 Þ : ¼ 16 2

This

implies

the

Lemma 2.17. Let x 2 J and s 2 K, then kmin(x)tr (s) 6 tr (x  s) 6 kmax(x) tr (s). Proof. For any x 2 J we have kmax ðxÞe  x 2 K. Furthermore, since s 2 K, it follows that tr ((kmax(x)e  x)  s) P 0. Hence, the second inequality in the lemma follows by writing tr (x  s) 6 tr (kmax (x)e  s) = kmax(x) tr (s). The proof of the first inequality goes in the similar way. h Lemma 2.18 (cf. Rangarajan, 2006, Lemma 2.9). Given x 2 intK, we have

1 x  s þ ðx  sÞ2 2 K: 4

sk2F

kx2 þ s2 k2F  4kx  sk2F P 0.

kx  x1 kF 6

kx2  ekF : kmin ðxÞ

Lemma 2.19 (cf. Schmieta and Alizadeh, 2003, Lemma 30). Let x; s 2 intK, then

kPðxÞ1=2 s  ekF 6 kx  s  ekF : Lemma 2.20 (cf. Sturm, 2000, Theorem 4). Let x; s 2 intK, then

kmin ðPðxÞ1=2 sÞ P kmin ðx  sÞ:

2

Since (x + s) and (x  s) belong to K, the trace of their product is nonnegative. Thus, we obtain

i 1 h trððx þ sÞ4 Þ þ trððx  sÞ4 Þ 16 i 1 h kðx þ sÞ2 k2F þ kðx  sÞ2 k2F : ¼ 16

kx  sk2F 6

Using Lemma 2.12 and kx + skF = kx  skF again, we get

kx  sk2F 6

i 1 1 h kx þ sk4F þ kx  sk4F ¼ kx þ sk4F : 16 8

This implies part (ii) of the lemma. Hence, the proof of the lemma is complete. h

3. Full NT-step for feasible IPM In preparation for dealing with our full NT-step IIPM, in this section we briefly recall the notions of central path and NT-step. The quadratic convergence result in Theorem 3.6 unifies the analysis known for LO (cf. Roos et al., 2006, Theorem II.50) and SDO (cf. de Klerk, 2002, Lemma 7.4). This property will be used later on when dealing with the full NT-step IIPM for SO, which is the main purpose of this paper. 3.1. The SO problem

Proposition 2.14 (cf. Faybusovich, 1997b, Theorem 5.13). Given x 2 intK, we have hx, si > 0, for all s 2 K n f0g.

Let J be a Euclidean Jordan algebra with rank r, and cone of squares (i.e., its associated symmetric cone) K. Consider the primal–dual pair of SO problems

Lemma 2.15. If x  s 2 intK, then det (x) – 0.

minfhc; xi : Ax ¼ b; x 2 Kg

ðCPÞ

and Proof. By Theorem 2.4, the element x can be write as P x ¼ ri¼1 ki ðxÞci , where {c1, . . . , cr} is a Jordan frame. Suppose det (x) = 0, then there must exist an integer k with 1 6 k 6 r, such that kk(x) = 0. Since x  s 2 intK and ck 2 K n f0g, by Proposition 2.14, 0 < hx  s, cki = hs, x  cki = 0. This contradiction completes the proof. h

T

maxfb y : AT y þ s ¼ c; y 2 Rm ; s 2 Kg:

ðCDÞ m

Here c and the rows of A lie in J , and b 2 R . Without loss of generality we assume that the rows of A are linearly independent. If ai is the ith row of A, then Ax = b means that hai, xi = bi, for each P i = 1, . . . , m, while ATy + s = c means m i¼1 yi ai þ s ¼ c.

476

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

We call (CP) feasible if there exists x 2 K such that Ax = b, and strictly feasible, if in addition, x 2 intK. Similarly, we call (CD) feasible if there exists ðy; sÞ 2 Rm  K such that ATy + s = c, and strictly feasible, if in addition s 2 intK. Let x and (y, s) be a primal–dual feasible pair, then T

P(u)x  P(u1)s = le, and then applying Newton’s method, we obtain the system

ADx ¼ 0; AT Dy þ Ds ¼ 0;

ð3:4Þ

1

1

1

PðuÞx  Pðu ÞDs þ Pðu Þs  PðuÞDx ¼ le  PðuÞx  Pðu Þs:

T

hc; xi  b y ¼ hAT y þ s; xi  b y ¼ hx; si P 0; where hx, si is called the duality gap. Theorem 3.1 (Ben-Tal and Nemirovski, 2001, Theorem 2.4.1). If the primal problem (CP) is strictly feasible and below bounded, then the dual (CD) is solvable and the optimal values in the problems coincide. Similarly, if the dual (CD) is strictly feasible and above bounded, then the primal (CP) is solvable and the optimal values coincide. If both of the problems are strictly feasible, then both of them are solvable, and the optimal values coincide.

Here we focus on the scaling point u = w1/2, where w is the NTscaling point of x and s as defined in Proposition 2.8. For that case we define

v

PðwÞ1=2 x :¼ pffiffiffiffi

PðwÞ1=2 s ¼ pffiffiffiffi

l

# ð3:5Þ

l

and

dx :¼ In the sequel, we always assume that both (CP) and (CD) are strictly feasible. Then it follows from Proposition 2.7 and Theorem 3.1 that finding an optimal solution of (CP) and (CD) is equivalent to solving the following system (cf. Faybusovich, 1997b; Schmieta and Alizadeh, 2003):

"

PðwÞ1=2 Dx ; pffiffiffiffi

ds :¼

l

PðwÞ1=2 Ds : pffiffiffiffi

ð3:6Þ

l

This enables us to rewrite the system (3.4) as follows:

pffiffiffiffi lAPðwÞ1=2 dx ¼ 0;

ð3:7Þ

pffiffiffiffi T Dy lAPðwÞ1=2 þ ds ¼ 0;

ð3:8Þ

x  s ¼ 0:

dx þ ds ¼ v 1  v :

ð3:9Þ

The basic idea of primal–dual IPMs is to replace the third equation in (3.1), the so-called complementarity condition, by the parameterized equation x  s = le, with l > 0. Thus, we consider the system

It easily follows that the above system has a unique solution. Since pffiffiffiffi (3.7) requires that dx belongs to the null space of lAPðwÞ1=2 , and pffiffiffiffi 1=2 (3.8) that ds belongs to the row space of lAPðwÞ , it follows that system (3.7)–(3.9) determines dx and ds uniquely as the (mutually orthogonal with respect to the trace inner product) components of the vector v1  v in these two spaces. From (3.9) and the orthogonality of dx and ds we obtain

Ax ¼ b;

x 2 K;

AT y þ s ¼ c;

Ax ¼ b;

s 2 K;

l

ð3:1Þ

x 2 K;

T

A y þ s ¼ c;

s 2 K;

ð3:2Þ

x  s ¼ le: For each l > 0 the system (3.2) has a unique solution (x(l), y(l), s(l)), and we call x(l) and (y(l), s(l)) the l-center of (CP) and (CD), respectively. The set of l-centers (with l running through all positive real numbers) gives a homotopy path, which is called the central path of (CP) and (CD). If l ? 0, then the limit of the central path exists and since the limit points satisfy the complementarity condition, the limit yields optimal solutions for (CP) and (CD) (Faybusovich, 1997b).

kdx k2F þ kds k2F ¼ kdx þ ds k2F ¼ kv 1  v k2F :

ð3:10Þ

To get the search directions Dx and Ds in the original space we simply transform the scaled search directions back to the x and s-space by using (3.6):

Dx ¼

pffiffiffiffi

pffiffiffiffi

lPðwÞ1=2 dx ; Ds ¼ lPðwÞ1=2 ds :

ð3:11Þ

The new iterate is obtained by taking a full NT-step, as follows:

xþ ¼ x þ Dx;

yþ ¼ y þ Dy;

zþ ¼ z þ Dz:

ð3:12Þ

3.2. The NT-search direction The natural way to define a search direction is to follow Newton’s approach and to linearize the third equation in (3.2). This leads to the system

ADx ¼ 0; AT Dy þ Ds ¼ 0;

ð3:3Þ

x  Ds þ s  Dx ¼ le  x  s: Due to the fact that x and s do not operator commute in general, i.e., L(x)L(s) – L(s)L(x), the system (3.3) does not always have a unique solution. In particular, for SDO, the system defines the AHO direction, which is not necessarily unique (Faybusovich, 1997b). It is now well known that this difficulty can be solved by applying a scaling scheme as follows (Schmieta and Alizadeh, 2003). Let u 2 intK, then we have

x  s ¼ le () PðuÞx  Pðu1 Þs ¼ le: Since x, s 2 intK, this is an easy consequence of Proposition 2.3 (ii), as becomes clear when using the fact that x  s = le holds if and only if x = ls1. Now, replacing the third equation in (3.3) by

3.3. The analysis of the NT-step For the analysis of the NT-step, we need to measure the distance of the iterate (x, y, s) to the current l-center (x(l), y(l), s(l)). The proximity measure that we are going to use is defined as follows:

dðx; s; lÞ  dðv Þ :¼ where

1 kv  v 1 kF ; 2

ð3:13Þ

v is defined in (3.5). It follows that

4d ðv Þ ¼ kv  v 1 k2F ¼ trðv 2 Þ þ trðv 2 Þ  2trðeÞ; 2

2

ð3:14Þ

2

which expresses d (v) in the eigenvalues of v and its inverse. Our first aim is to find a condition that guarantees feasibility of the iterate after a full NT-step. As before, let x; s 2 intK, l > 0 and let w be the scaling point of x and s. Using (3.5), (3.6) and (3.12), we obtain

pffiffiffiffi

lPðwÞ1=2 ðv þ dx Þ; sþ ¼ s þ Ds ¼ lPðwÞ1=2 ðv þ ds Þ:

xþ ¼ x þ Dx ¼

pffiffiffiffi

ð3:15Þ

477

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

Since P(w)1/2 and its inverse P(w)1/2 are automorphisms of intK (cf. Theorem 2.5), x+ and s+ will belong to intK if and only if v + dx and v + ds belong to intK. To find a feasibility condition, which is Lemma 3.3, we need the following lemma.

Lemma 3.5 (Vieira, 2007, Proposition 5.9.3). One has

Lemma 3.2. If d(v) < 1 then e þ dx  ds 2 intK.

Proof. It readily follows from (3.16) and Lemma 2.10 that

Proof. Since dx and ds are orthogonal with respect to the trace inner product, Lemma 2.13 implies that the absolute values of the eigenvalues of dx  ds do not exceed 14 kdx þ ds k2F . In addition, it follows from (3.10) and (3.13) that

kdx þ ds k2F ¼ kv  v 1 k2F ¼ 4dðv Þ2 : Hence, the absolute values of the eigenvalues of dx  ds do not exceed d(v)2. This implies that 1  d(v)2 is a lower bound for the eigenvalues of e + dx  ds. Hence, if d(v) < 1, then e þ dx  ds 2 intK. This proves the lemma. h Lemma 3.3. The full NT-step is strictly feasible if d(v) < 1. Proof. We introduce a step length a with 0 6 a 6 1, and define v ax ¼ v þ adx , v as ¼ v þ ads . We then have v 0x ¼ v , v 1x ¼ v þ dx , and similarly v 0s ¼ v , v 1s ¼ v þ ds . It follows from (3.9) that a

a

vx  vs

¼ ðv þ adx Þ  ðv þ ads Þ ¼ v þ av  ðdx þ ds Þ þ a dx  ds 2

2

¼ v 2 þ av  ðv 1  v Þ þ a2 dx  ds

vþ 

 1=2 Pðv þ dx Þ1=2 ðv þ ds Þ :



pffiffiffiffi

lv þ ¼ Pðwþ Þ1=2 sþ  Pðxþ Þ1=2 sþ

:

Due to (3.15) and Lemma 2.9, we may write

 1=2 Pðxþ Þ1=2 sþ ¼ lP PðwÞ1=2 ðv þ dx Þ PðwÞ1=2 ðv þ ds Þ  lPðv þ dx Þ1=2 ðv þ ds Þ: From this the lemma follows. h Theorem 3.6. If d :¼ d(v) < 1, then the full NT-step is strictly feasible and

d2 dðv þ Þ 6 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 2ð1  d2 Þ Proof. Since d :¼ d(v) < 1, from Lemma 3.3 and its proof, it follows that v + dx, v + ds, and (v + dx)  (v + ds) belong to the intK. Let us, for the moment, denote

u :¼ Pðv þ dx Þ1=2 ðv þ ds Þ;

¼ ð1  aÞv 2 þ ae þ a2 dx  ds :

1=2

 :¼ ðv þ dx Þ  ðv þ ds Þ: u

Then, it follows from Lemma 3.5 that

v+  u1/2. Therefore

Since d(v) < 1, Lemma 3.2 implies that dx  ds K  e. Substitution gives

2dðv þ Þ ¼ kv þ  ðv þ Þ1 kF ¼ ku1=2  u1=2 kF :

v ax  v as K ð1  aÞv 2 þ ae  a2 e ¼ ð1  aÞðv 2 þ aeÞ:

By applying Lemma 2.18, we obtain

If 0 6 a 6 1, the last vector belongs to K, i.e., we have v ax  v as K 0, for a 2 [0, 1]. By Lemma 2.15, detðv ax Þ and detðv as Þ do not vanish for a 2 [0, 1]. Since detðv 0x Þ ¼ detðv 0s Þ ¼ detðv Þ > 0, by continuity, detðv ax Þ and detðv as Þ stay positive for all a 2 [0, 1]. Moreover, by Theorem 2.4, this implies that all the eigenvalues of v ax and v as stay positive for all a 2 [0, 1]. Thus we obtain that all the eigenvalues of v 1x and v 1s are nonnegative. This proves the lemma. h

2dðv þ Þ ¼ ku1=2  u1=2 kF 6

ku  ekF ku  ekF : ¼ kmin ðu1=2 Þ kmin ðuÞ1=2

In addition, we derive from Lemmas 2.19 and 2.20 that

2dðv þ Þ 6

ku  ekF kmin ðuÞ

1=2

6

  ekF ku :  Þ1=2 kmin ðu

It follows from (3.9) that

 ¼ ðv þ dx Þ  ðv þ ds Þ ¼ v 2 þ v  ðdx þ ds Þ þ dx  ds u ¼ v 2 þ v  ðv 1  v Þ þ dx  ds ¼ e þ dx  ds

Lemma 3.4. Let x; s 2 intK and l > 0, then hx+, s+i = ltr (e).

and the substitution gives Proof. Due to (3.15) we may write

hxþ ; sþ i ¼

2dðv þ Þ 6

Dpffiffiffiffi

E pffiffiffiffi lPðwÞ1=2 ðv þ dx Þ; lPðwÞ1=2 ðv þ ds Þ

Using (3.9) we obtain

2dðv þ Þ 6

hv þ dx ; v þ ds i ¼ hv ; v i þ hv ; dx þ ds i þ hdx ; ds i ¼ hv ; v i þ hv ; v 1  v i þ hdx ; ds i ¼ trðeÞ þ hdx ; ds i: Since dx and ds are orthogonal with respect to the trace inner product, the lemma follows. h Next, we prove quadratic convergence to the target point (x(l), s(l)) when taking full NT-steps. According to (3.5), the v-vector after the step is given by:

v

Pðwþ Þ1=2 xþ :¼ pffiffiffiffi

l

"

# Pðwþ Þ1=2 sþ ; ¼ pffiffiffiffi

l

where w+ is the scaling point of x+ and s+.

kmin ðe þ dx  ds Þ

1=2

¼

kdx  ds kF ½1 þ kmin ðdx  ds Þ 1=2

:

Now we apply Lemma 2.13. Part (i) of this lemma implies that d2 is a upper bound for kk(ds  ds)k1, as we already established in the proof of Lemma 3.2. Also using part (ii) of Lemma 2.13 we may now write

¼ lhv þ dx ; v þ ds i:

þ

kdx  ds kF

ð3:16Þ

kdx  ds kF ½1 þ kmin ðdx  ds Þ 1=2

6

pffiffiffi pffiffiffi 2 12 2kdx þ ds k2F 2d pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 2 1d 1  d2

which implies the lemma. h As a result, the following corollary readily follows. pffiffiffi Corollary 3.7. If dðv Þ 6 1= 2, then the full NT-step is strictly feasible and d(v+) 6 d(v)2.

4. A full NT-step infeasible IPM A triple (x, y, s) is called an e-optimal solution of (CP) and (CD) if the norms of the residual vectors b  Ax and c  ATy  s do not

478

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

exceed e, and also the duality gap satisfies tr (x  s) 6 e. In this section we present an infeasible-start algorithm that generates an eoptimal solution of (CP) and (CD), if it exists, or establish that no such solution exists. 4.1. The perturbed problems We assume (CP) and (CD) have an optimal solution (x⁄, y⁄, s⁄) with vanishing duality gap, i.e., tr (x⁄  s⁄) = 0. As it is common for infeasible IPMs we start the algorithm with a triple (x0, y0, s0) and l0 > 0 such that

x0 ¼ fe;

y0 ¼ 0;

s0 ¼ fe;

l0 ¼ f 2 ;

ð4:1Þ

where f is a (positive) number such that

x þ s K fe:

ð4:2Þ

The initial values of the primal and dual residual vectors are denoted as r0p and r 0d , respectively. So we have

r 0p ¼ b  Ax0 ;

r 0d ¼ c  AT y0  s0 :

In general, we have r0p – 0 and r 0d – 0, i.e., the initial iterate is not feasible. The iterates generated by the algorithm will (in general) be infeasible for (CP) and (CD) as well, but they will be feasible for perturbed versions of (CP) and (CD) that we introduce in the sequel. For any m with 0 6 m 6 1 we consider the perturbed problem (4.3), defined by

n o min ðc  mr 0d ÞT x : b  Ax ¼ mr 0p ; x 2 K ;

ð4:3Þ

and its dual problem (4.4), given by

n o max ðb  mr 0p ÞT y : c  AT y  s ¼ mr 0d ; s 2 K :

ð4:4Þ

Note that these problems are defined in such a way that if (x, y, s) is feasible for (CPm) and (CDm) then the residual vectors for the given triple (x,y,s) with respect to the original problems (CP) and (CD) are mr 0p and mr0d , respectively. If m = 1 then x = x0 yields a strictly feasible solution of (4.3), and (y, s) = (y0, s0) a strictly feasible solution of (4.4). This means that if m = 1, then (4.3) and (4.4) are strictly feasible. Lemma 4.1. Let (CP) and (CD) be feasible and 0 < m 6 1. Then, the perturbed problems (4.3) and (4.4) satisfy the IPC. ; sÞ a feasible soluProof. Let  x be a feasible solution of (CP) and ðy  þ s ¼ c, s 2 K. Consider tion of (CD), i.e., A x ¼ b,  x 2 K and AT y

x ¼ ð1  mÞx þ mx0 ;

 þ my0 ; y ¼ ð1  mÞy

b  Ax ¼ b  A½ð1  mÞx þ mx0 ¼ b  ð1  mÞb  mAx0 ¼ mðb  Ax0 Þ ¼ mr 0p ; showing that x is strictly feasible for (4.3). In precisely the same way one shows that (y, s) is strictly feasible for (4.4). Thus we have shown that (4.3) and (4.4) satisfy the IPC. h Let (CP) and (CD) be feasible and 0 < m 6 1. Then Lemma 4.1 implies that the problems (4.3) and (4.4) are strictly feasible, and therefore their central paths exist. This means that for every l > 0 the system

x 2 K;

c  AT y  s ¼ mr 0d ; x  s ¼ le

s 2 K;

4.2. A full NT-step infeasible IPM algorithm We just established that if m = 1 and l = l0, then x = x0 and (y, s) = (y0, s0) are the l-centers of (4.3) and (4.4), respectively. These are our initial iterates. We measure proximity to the l-center of the perturbed problems by the quantity d(x, s; l) as defined in (3.13). So, initially we have d(x, s; l) = 0. In the sequel we assume that at the start of each iteration, just before the l- and m-update, d(x, s; l) is smaller than or equal to a (small) threshold value s > 0. This condition is certainly satisfied at the start of the first iteration. Since we then have d(x, s; l) = 0. Also at the start we have tr (x  s) = l0tr (e). Now we describe one (main) iteration of our algorithm. Suppose that for some m 2 (0, 1] we have x, and (y, s) satisfying the feasibility conditions (4.3) and (4.4) for l = ml0, and such that tr (x  s) = ltr (e) and d(x, s; l) 6 s. We reduce m to m+ = (1  h)m, with h 2 (0, 1), and find new iterate (x+, y+, s+) that satisfies (4.3) and (4.4) with m replaced by m+ and l by l+ = m+l0 = (1  h)l, and such that tr (x+  s+) = l+tr (e) and d(x+, s+; l+) 6 s. More in detail, every (main) iteration consists of a feasibility step and a few centering steps. The feasibility step serves to get iterates (xf, yf, sf) that are strictlypffiffiffifeasible for ðCPmþ Þ and ðCPmþ Þ, and such that dðxf ; sf ; lþ Þ 6 1= 2. In other words, (xf, yf, sf) belongs to the quadratic convergence neighborhood with respect to the l+-center of ðCPmþ Þ and ðCPmþ Þ. Hence, because the NT-step is quadratically convergent in that region, a few centering steps, starting from (xf, yf, sf) and targeting at the l+-center of ðCPmþ Þ and ðCPmþ Þ will generate the iterate (x+, y+, s+) that is feasible for ðCPmþ Þ and ðCPmþ Þ and satisfies tr (x+  s+) = l+tr (e) and d(x+, s+; l+) 6 s. A formal description of the algorithm is given in Algorithm 4.1. Recall that after each iteration the residuals and the duality gap are reduced by the factor (1  h). The algorithm stops if the norms of the residuals and the duality gap are less than the accuracy parameter e. Algorithm 4.1 A full NT-step infeasible IPM for SO

s ¼ ð1  mÞs þ ms0 :

Since x is the sum of the vectors ð1  mÞx 2 K and mx0 2 intK, we have x 2 intK. Moreover,

b  Ax ¼ mr 0p ;

has a unique solution. This solution is denoted as (x(l, m), y(l, m), s(l, m)). These are the l-centers of the perturbed problems (4.3) and (4.4). In the sequel the parameters l and m will always be in a one-to-one correspondence, according to l = ml0 = mf2. Therefore, we feel free to omit one parameter and denote (x(l, m), y(l, m), s(l, m)) simply as (x(m), y(m), s(m)). Due to the choice of the initial iterate, according to (4.1), we have x0  s0 = l0e. Hence x0 is the l0-center of the perturbed problem (CP1) and (y0, s0) the l0-center of the perturbed problem (CD1). In other words, (x(1), y(1), s(1)) = (x0, y0, s0).

ð4:3Þ ð4:4Þ ð4:5Þ

Input: accuracy parameter e > 0; update parameter h, 0 < h < 1; threshold parameter s > 0; initialization parameter f > 0. begin x :¼ fe; y :¼ 0; s :¼ fe; l :¼ l0 = f2; m :¼ 1. while max(tr (x  s), kb  AxkF, kc  ATy  skF) P e feasibility step: (x, y, s) :¼ (x, y, s) + (Dfx, Dfy, Dfs). update of l and m: l :¼ (1  h)l; m :¼ (1  h)m. centering steps: while d(x, s; l) P s (x, y, s) :¼ (x, y, s) + (Dx, Dy, Ds). endwhile endwhile end

479

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

To get the search directions Dfx and Dfs in the original x and s-space we use (4.8), which gives

4.3. Analysis of the feasibility step In this subsection, we define and analyze the feasibility step. This is the most difficult part of the analysis. In essence we follow the same chain of arguments as in Roos (2006), but at several places the analysis is simpler, more precise, and also more elegant. First, we describe the feasibility step in detail. The analysis will follow in the sequel. Suppose we have strictly feasible iterates (x, y, s) for (4.3) and (4.4). This means that (x, y, s) satisfies (4.3) and (4.4), with l = mf2. We need displacements Dfx, Dfy and Dfs such that

xf :¼ x þ Df x;

yf :¼ y þ Df y;

sf :¼ s þ Df s;

ð4:6Þ

are feasible for ðCPmþ Þ and ðCPmþ Þ. One may easily verify that (xf, yf, sf) satisfies (4.3) and (4.4), with m replaced by m+ and l by l+ = m+l0 = (1  h)l, only if the first two equations in the following system are satisfied.

AT Df y þ Df s ¼ hmr 0d ; PðuÞx  Pðu1 ÞDf s þ Pðu1 Þs  PðuÞDf x ¼ ð1  hÞle  PðuÞx  Pðu1 Þs: The third equation is inspired by the third equation in the system (3.4) that we used to define search directions for the feasible case, except that we target at the l+-centers of ðCPmþ Þ and ðCDmþ Þ. As in the feasible case, we use the NT-scaling scheme to guarantee that the above system has a unique solution. So we take u = w1/2, where w is the NT-scaling point of x and s. Then the third equation becomes

PðwÞ1=2 x  PðwÞ1=2 Df s þ PðwÞ1=2 s  PðwÞ1=2 Df x

f

PðwÞ1=2 Df x ; pffiffiffiffi

l

f

ds :¼

PðwÞ1=2 Df s pffiffiffiffi

l

ð4:8Þ

lv 

þ

f ds Þ

¼ ð1  hÞle  lv : 2

By multiplying both sides of this equation from left with l1L(v)1 this equation becomes f

f

dx þ ds ¼ ð1  hÞv 1  v : Thus, we arrive at the following system for the scaled search directions in the feasibility step:

pffiffiffiffi

lAPðwÞ1=2 dfx ¼ hmr0p ;

pffiffiffiffi  T Df y 1 f lAPðwÞ1=2 þ ds ¼ pffiffiffiffi hmPðwÞ1=2 r0d ;

l

f

f

dx þ ds ¼ ð1  hÞv 1  v :

l

The new iterates are obtained by taking a full step, as given by (4.6). Hence, we have

pffiffiffiffi lPðwÞ1=2 ðv þ dfx Þ; pffiffiffiffi f xf ¼ s þ Df s ¼ lPðwÞ1=2 ðv þ ds Þ:

xf ¼ x þ Df x ¼

ð4:10Þ

From the third equation in (4.9) we derive that f

f

f

f

ðv þ dx Þ  ðv þ ds Þ ¼ v 2 þ v  ½ð1  hÞv 1  v þ dx  ds f

f

¼ ð1  hÞe þ dx  ds :

Lemma 4.2. The iterate f f ð1  hÞe þ dx  ds 2 intK.

ð4:11Þ

(xf, yf, sf)

is

strictly

feasible

if

Proof. Just as in the proof of Lemma 3.3 we introduce a step length a with 0 6 a 6 1, and define

f

f

We then have v 0x ¼ v , v 1x ¼ v þ dx , and similarly v 0s ¼ v , v 1s ¼ v þ ds . f f From the third equation in (4.9), i.e., dx þ ds ¼ ð1  hÞv 1 þ v , it follows that

v ax  v as ¼ ðv þ adfx Þ  ðv þ adfs Þ ¼ v 2 þ av  ðdfx þ dfs Þ þ a2 dfx  dfs f f ¼ v 2 þ av  ½ð1  hÞv 1  v þ a2 dx  ds f f ¼ ð1  aÞv 2 þ að1  hÞe þ a2 dx  ds : f

f

f

f

If ð1  hÞe þ dx  ds 2 intK, we have dx  ds K  ð1  hÞe. Substituting this into the above equality gives

v ax  v as K ð1  aÞv 2 þ að1  hÞe  a2 ð1  hÞe ¼ ð1  aÞðv 2 þ að1  hÞeÞ: Since

with w denoting the scaling point of x and s, as defined in Proposition 2.8. With the vector v as defined in (3.5), the Eq. (4.7) can be restated as f ðdx

pffiffiffiffi

lPðwÞ1=2 dfx ; Df s ¼ lPðwÞ1=2 dfs :

v ax ¼ v þ adfx ; v as ¼ v þ adfs : ð4:7Þ

Due to this choice of u the coefficient matrix of the resulting system is exactly the same as in the feasible case, and hence it defines the feasibility step uniquely. By its definition, after the feasibility step the iterates satisfy the affine equations in (4.3) and (4.4), with m replaced by m+. The hard part in the analysis will be to guarantee that xf ; sf 2 intK pffiffiffi and to guarantee that the new iterate satisfies dðxf ; sf ; lþ Þ 6 1= 2. Let (x,y,s) denote the iterate at the start of an iteration with tr (x  s) = ltr (e) and d(x, s; l) 6 s. Recall that at the start of the first iteration this is certainly true, because tr (x0  s0) = l0tr (e) and d(x0, s0; l0) = 0. We scale the search directions, just as we did in the feasible case (cf. (3.6)), by defining

dx :¼

pffiffiffiffi

As we mentioned before, the analysis of the algorithm as presented below is much more difficult than in the feasible case. f The main reason for this is that the scaled search directions dx f and ds are not (necessarily) orthogonal (with respect to the trace inner product). Using the same arguments as in Sub Section 3.3 it follows from f (4.10) that xf and sf are strictly feasible if and only if v þ dx and f v þ ds belong to intK. Using this we have the following lemma.

ADf x ¼ hmr 0p ;

¼ ð1  hÞle  PðwÞ1=2 x  PðwÞ1=2 s:

Df x ¼

v 2 2 intK, we have v 2 þ að1  hÞe 2 intK. Hence,

v ax  v as K ð1  aÞðv 2 þ að1  hÞeÞ K 0;

for a 2 ½0; 1 :

By Lemma 2.15, it follows that detðv x and detðv as Þ do not vanish for a 2 [0, 1]. Since detðv 0x Þ ¼ detðv 0s Þ ¼ detðv Þ > 0, by continuity, detðv ax Þ and detðv as Þ stay positive for all a 2 [0, 1]. Moreover, by Theorem 2.4, this implies that all the eigenvalues of v ax and v as stay positive for all a 2 [0, 1]. Hence, we obtain that all the eigenvalues of v 1x and v 1s are nonnegative. This proves that if ð1  hÞeþ dfx  dfs 2 f f intK, then v þ dx 2 K and v þ ds int 2 K, i.e., the iterate (xf, yf, sf) is strictly feasible. This proves the lemma. h We proceed by deriving an upper bound for d(xf, sf; l+). Let wf be the scaling point of xf and sf. Denoting the v-vector after the feasibility step with respect to the l+-center as vf, we have, according to (3.5), aÞ

ð4:9Þ

vf

Pðwf Þ1=2 xf :¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lð1  hÞ

"

# Pðwf Þ1=2 sf ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : lð1  hÞ

ð4:12Þ

480

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

Substitution of the above inequality into the inequality of Lemma 4.4 yields that

Lemma 4.3. One has

h i1=2 pffiffiffiffiffiffiffiffiffiffiffiffi f f : 1  hv f  Pðv þ dx Þ1=2 ðv þ ds Þ

 2  2  f f 2  f f 2 1 dx ds  kd k þ d 1  h   x F s  1h  4 F  f Ff  6 4dðv f Þ2 6 : f 2 f 2 kðdx ds Þ kd k 1 x F þkds kF   1  1   1h  2 1h

Proof. It follows from (4.12) and Lemma 2.10 that

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi f lð1  hÞv ¼ Pðwf Þ1=2 sf  ðPðxf Þ1=2 sf Þ1=2 :

1

f

We have derived an upper bound for d(vf), in terms of kdx k2F þ f 2 kds kF . As our ultimate goal ispto ffiffiffi choose h, 0 < h < 1, as large as possible, such that dðv f Þ 6 1= 2, we need an upper bound for f f kdx k2F þ kds k2F . For the moment, let us define

Due to (4.10) and Lemma 2.9, we may write

 1=2 f f Pðxf Þ1=2 sf ¼ lP PðwÞ1=2 ðv þ dx Þ PðwÞ1=2 ðv þ ds Þ f

f

 lPðv þ dx Þ1=2 ðv þ ds Þ:

rp :¼ hmr0p ;

Thus, we obtain

r d :¼ hmr 0d ;

r :¼ ð1  hÞv 1  v :

Df y

With n :¼  l , the system (4.9) (by eliminating

i1=2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi f pffiffiffiffih lð1  hÞv  l Pðv þ dfx Þ1=2 ðv þ dfs Þ

ð4:15Þ f ds )

reduces to

pffiffiffiffi lAPðwÞ1=2 dfx ¼ rp ;

and the statement of the lemma follows. h In the sequel we denote d(xf, sf; l+) shortly as d(vf). f

ð4:14Þ

ð4:16Þ

pffiffiffiffi T 1 lAPðwÞ1=2 n þ dfx ¼ r  pffiffiffiffi PðwÞ1=2rd :

ð4:17Þ

l

f

Lemma 4.4. If kkðdx  ds Þk1 < 1  h, then

Multiplying both sides of (4.17) from the left with using (4.16) it follows that

 f f 2 dx ds   1h   f Ff  : 4dðv f Þ2 6 kðdx ds Þ  1  1h 



pffiffiffiffi lAPðwÞ1=2 and



pffiffiffiffi 1 lAPðwÞAT n þ rp ¼ lAPðwÞ1=2 r  pffiffiffiffi PðwÞ1=2rd : l

1

Therefore, Proof. Since follows that intK.

f kkðdx

v

 f þ dx ,

f ds Þk1

v

< 1  h, from Lemma 4.2 and its proof, it f f f þ ds , and ðv þ dx Þ  ðv þ ds Þ belong to the

v þ dfxffi pffiffiffiffiffiffiffiffiffiffiffi

!1=2

1h

!

v þ dfs ffi pffiffiffiffiffiffiffiffiffiffiffi

1h

!

;

 :¼ u

v þ dfxffi pffiffiffiffiffiffiffiffiffiffiffi



1h

v þ dfs ffi pffiffiffiffiffiffiffiffiffiffiffi

:

1h

2dðv Þ ¼ ku

1=2

l

ð4:18Þ

u

l

ku  ekF ku  ekF kF 6 : ¼ kmin ðu1=2 Þ kmin ðuÞ1=2

To simplify the above expression we denote

P ¼ PðwÞ1=2 AT ðAPðwÞAT Þ1 APðwÞ1=2 :

Now, from Lemmas 2.19 and 2.20 we derive

2dðv f Þ 6

  pffiffiffiffi 1 lAPðwÞ1=2 r  pffiffiffiffi PðwÞ1=2rd  rp :

1 þ pffiffiffiffi PðwÞ1=2 AT ðAPðwÞAT Þ1rp :

By applying Lemma 2.18, we obtain 1=2



l

2dðv f Þ ¼ kv f  ðv f Þ1 kF ¼ ku1=2  u1=2 kF :

f

l

ðAPðwÞAT Þ1

1 1 f dx ¼ r  pffiffiffiffi PðwÞ1=2r d  pffiffiffiffi PðwÞ1=2 AT ðAPðwÞAT Þ1 l l    pffiffiffiffi 1  lAPðwÞ1=2 r  pffiffiffiffi PðwÞ1=2rd  rp l  h i 1 ¼ I  PðwÞ1=2 AT ðAPðwÞAT Þ1 APðwÞ1=2 r  pffiffiffiffi PðwÞ1=2r d

!

vf  u1/2 (cf. Lemma 4.3), we have

Then, since

1

Substitution into (4.17) gives

Let us, for the moment, denote

u :¼ P



ku  ekF kmin ðuÞ

1=2

6

Note that P is (the matrix of) the orthogonal projection (with respect to the trace inner product) to the row space of the matrix AP(w)1/2. We now may write

  ekF ku :  Þ1=2 kmin ðu

  1 1 f dx ¼ ½I  P r  pffiffiffiffi PðwÞ1=2r d þ pffiffiffiffi PðwÞ1=2 AT ðAPðwÞAT Þ1r p :

From the third equation in (4.9) it follows that f

f

f

f

f

l

f

 ¼ ðv þ dx Þ  ðv þ ds Þ ¼ v 2 þ v  ðdx þ ds Þ þ dx  ds ð1  hÞu f

f

f

f

¼ v 2 þ v  ðð1  hÞv 1  v Þ þ dx  ds ¼ ð1  hÞe þ dx  ds

l

 þ s ¼ c. Then we may write ; sÞ be such that Ax ¼ b and AT y Let ðx; y

and substitution into the above inequality gives

rp ¼ hmr 0p ¼ hmðb  Ax0 Þ ¼ hmAðx  x0 Þ;

   f  f f f dx  ds 1  h kdx  ds 1  hkF F 2dðv f Þ 6 6  1=2 ;  1=2 f f dfx dfs 1  kðdx  ds Þ1  hk1 kmin e þ 1h

    y0 Þ þ s  s0 : rd ¼ hmr 0d ¼ hmðc  AT y0  s0 Þ ¼ hm AT ðy f

which proves the lemma. h From the definition of the Frobenius norm, Lemmas 2.16, 2.12, we have

     1  f  f  f 2 f  f f 2 kðdx  ds Þ 6 dx  ds  6 ðdx Þ þ ðds Þ  1 F F 2     1  1   f 2  f 2 f 2 f 2 kdx kF þ kds kF : 6 ðdx Þ  þ ðds Þ  6 2 2 F F

Substituting the expressions of r p and rd into the expression for dx , we obtain

   hm f   y0 Þ þ s  s0 dx ¼ ½I  P r  pffiffiffiffi PðwÞ1=2 AT ðy

l

hm þ pffiffiffiffi PPðwÞ1=2 ðx  x0 Þ: ð4:13Þ

l

Since I  P is the orthogonal projection to the null space of AP(w)1/2 we have

481

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

 2  2     1=2 ðx  x0 Þ ¼ PðwÞ1=2 ðx0  xÞ PðwÞ

  y0 Þ ¼ 0: ½I  P PðwÞ1=2 AT ðy

F

Hence, it follows that

¼ hPðwÞ ðx  xÞ; x0  xi

 



hm hm f dx ¼ I  P r  pffiffiffiffi PðwÞ1=2 s  s0 þ pffiffiffiffi PPðwÞ1=2 ðx  x0 Þ:

l

 ðx0  xÞi

To proceed we further simplify the above expression by defining

hm u ¼ pffiffiffiffi PðwÞ1=2 ðx  x0 Þ;

l

hm u ¼ pffiffiffiffi PðwÞ1=2 ðs  s0 Þ: s

l

6 hPðwÞ1 ðx0  xÞ; fei ¼ fhPðwÞ1 e; x0  xi

ð4:19Þ

¼ fhPðwÞ1 e; fei  fhPðwÞ1 e; fe  ðx0  xÞi 6 f2 trðw2 Þ:

Then we may write f

dx ¼ ½I  P ðr  us Þ þ Pux : f

For ds we obtain, by using the third equation in (4.9) and the definition (4.15) of r, f ds

¼ r 

f dx

s

x

s

kux k2F þ kus k2F 6 h2 mtrðw2 þ w2 Þ:

ð4:23Þ

¼ r  ½I  P r þ ½I  P u  Pu ¼ ½I  P u þ Pðr  u Þ:

f

Therefore, using the orthogonality (with respect to the trace inner product) of vectors with different subscripts, we may write f f kdx k2F þ kds k2F ¼ kr1  us1 k2F þ kux2 k2F þ kus1 k2F þ kr 2  ux2 k2F

¼ kr1 k2F þ kus1 k2F  2hr 1 ; us1 i þ kux2 k2F þ kus1 k2F þ kr2 k2F þ kux2 k2F  2hr 2 ; ux2 i ¼ krk2F þ 2kux2 k2F þ 2kus1 k2F  2hr1 ; us1 i  2hr2 ; ux2 i: Using the Cauchy–Schwartz inequality, and the properties of orthogonal projection, we further obtain

 2  2  2    f f kdx k2F þ ds  6 kr k2F þ 2ux2 F þ 2us1 F þ 2kr 1 kF us1 F F   þ 2kr 2 kF ux2 F  2  2 6 kr k2F þ 2ux2 F þ 2us1 F þ kr 1 k2F þ kus1 k2F þ kr 2 k2F þ kux2 k2F   6 2kr k2F þ 3 kux k2F þ kus k2F :

¼ 4ð1  hÞ2 dðv Þ2 þ h2 r;

l

:

Proof. For the moment, let u :¼ (P(x1/2)s)1/2. Then, by Proposition 2.8, w = P(x1/2)u. Using that P(x1/2) is self-adjoint, and also Lemma 2.17, we obtain

trðw2 Þ ¼ hPðx1=2 Þu; Pðx1=2 Þui ¼ hu; PðxÞui 6 kmax ðuÞtrðPðxÞuÞ: Using the same arguments as above and the fact that P(x)e = x2 we may write

trðPðxÞuÞ ¼ trðPðxÞu  eÞ ¼ hPðxÞu; ei ¼ hu; PðxÞei ¼ hu; x2 i 6 kmax ðuÞ trðx2 Þ:

ð4:20Þ

kmax ðPðx1=2 Þs1 Þtrðx2 Þ ¼

and

trðx2 Þ trðx2 Þ : ¼ kmin ðPðx1=2 ÞsÞ lkmin ðv Þ2

Thus, we obtain

trðw2 Þ 6

trðx2 Þ

lkmin ðv Þ2

:

Recalling that w1 is the scaling point of s and x, it follows from the above inequality, by interchanging the role of x and s, that

ð4:21Þ

  2    kPðwÞ1=2 ðx  x0 Þk2F þ PðwÞ1=2 ðs  s0 Þ : F

trðw2 Þ 6

trðs2 Þ

lkmin ðv Þ2

Let (x⁄, y⁄, s⁄) be the optimal solution satisfying (4.2). It follows that  ¼ y and Ax⁄ = b and ATy⁄ + s⁄ = c. Therefore, we may choose x ¼ x , y s ¼ s . Since x⁄ is feasible for (CP) we have x K 0. Also s K 0. Hence we have 0K x K x þ s K fe, or equivalently 0K  xK fe. In a similar way we derive that 0KsK fe. Therefore, it follows that

0K s0  sK fe:

We first consider the term kPðwÞ1=2 ðx  x0 Þk2F . Using that P(w)1/2 is self-adjoint with respect to the inner product and P(w)e = w2, we have

:

By adding the last two inequalities we obtain

trðw2 þ w2 Þ 6

ð4:22Þ

0K x0  xK fe;

lkmin ðv Þ2

Due to P(s1/2)x  P(x1/2)s  (P(w)1/2s)2  (P(w1/2)x)2 = lv2 Proposition 2.3 (ii), we have

where r = tr (e) is the rank of the associated Euclidean Jordan algebra. On the other hand, due to (4.19) we have

h2 m2

trðx þ sÞ2

trðw2 Þ 6 kmax ðPðx1=2 Þs1 Þtrðx2 Þ:

 2  2 kr k2F ¼ ð1  hÞv 1  v F ¼ ð1  hÞðv 1  v Þ  hv F  2 ¼ ð1  hÞ2 v 1  v F þ h2 kv k2F

¼

trðw2 þ w2 Þ 6

Combining the above inequalities we obtain

Since v and v1  v are orthogonal (with respect to the trace inner product) and kv k2F ¼ trðeÞ, we have

kus k2F

Lemma 4.5. One has

f ds ¼ us1 þ r 2  ux2 :

dx ¼ r1  us1 þ ux2 ;

þ

Similarly, it follows that kPðwÞ1=2 ðs  s0 Þk2F 6 f2 trðw2 Þ. Substitution of the last two inequalities into (4.22) gives kux k2F þ kus k2F 2 2 2 6 h ml f trðw2 þ w2 Þ. By using l = ml0 = mf2 we derive

x

We denote ½I  P r ¼ r1 and Pr ¼ r2 , and use similar notations for the f projections of ux and us. Then from the above expressions for dx and f ds we derive that

kux k2F

0

¼ hPðwÞ1 ðx0  xÞ; fei  hPðwÞ1 ðx0  xÞ; fe

l

x

F

1

trðx2 Þ þ trðs2 Þ

lkmin ðv Þ2

:

Since x; s 2 K, we have tr (x  s) P 0. Together with the fact that tr (u2) 6 tr (u)2 for each u 2 K (cf. Lemma 2.11), we obtain

trðx2 Þ þ trðs2 Þ 6 trðx2 Þ þ trðs2 Þ þ 2 trðx  sÞ ¼ trððx þ sÞ2 Þ 6 trðx þ sÞ2 : Substitution yields

trðw2 þ w2 Þ 6

trðx þ sÞ2

lkmin ðv Þ2

;

which completes the proof. h

482

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

Substituting the result of the above lemma into inequality (4.23), we obtain

kux k2F þ kus k2F 6 h2 m

trðx þ sÞ2

lkmin ðv Þ2

ð4:24Þ

:

Concluding, we have, by substituting (4.21) and (4.24) into (4.20), and using the fact that l = ml0 = mf2,

h i trðx þ sÞ2 f f kdx k2F þ kds k2F 6 2 4ð1  hÞ2 dðv Þ2 þ h2 r þ 3h2 m lkmin ðv Þ2 h i ¼ 2 4ð1  hÞ2 dðv Þ2 þ h2 r þ 3h2 trðx þ sÞ2 f2 kmin ðv Þ2 ; ð4:25Þ where r is the rank of the associated Euclidean Jordan algebra. To continue, we need an upper bound for tr (x + s), and a lower bound for kmin(v), which are contained in the following two lemmas.

Proof. By the definition of d(v) (cf. (3.13)), we have

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u r uX 1 2d ¼ kv  v kF ¼ t ðki ðv Þ  1=ki ðv ÞÞ2 : i¼1

Since ki(v) > 0, we derive

2dki ðv Þ 6 1  ki ðv Þ2 6 2dki ðv Þ; which implies

ki ðv Þ2  2dki ðv Þ  1 6 0 6 ki ðv Þ2 þ 2dki ðv Þ  1: Rewriting this as

ðki ðv Þ  dÞ2  1  d2 6 0 6 ðki ðv Þ þ dÞ2  1  d2 ; we obtain

ðki ðv Þ  dÞ2 6 1 þ d2 6 ðki ðv Þ þ dÞ2 ; which implies

Lemma 4.6. Let x and (y,s) be feasible for the perturbed problems (CPm) and (CDm), respectively, and tr (x  s) = ltr (e). With (x0, y0, s0) as defined in (4.1) and f as in (4.2), we then have tr (x + s) 6 2fr, where r = tr (e) is the rank of the associated Euclidean Jordan algebra. Proof. Let (x⁄, y⁄, s⁄) be the optimal solution satisfying (4.2). Then from the feasibility conditions of the perturbed problems (4.3) and (4.4), it is easily seen that

A y  my0  ð1  mÞy



þ s  ms0  ð1  mÞs ¼ 0:





tr x  mx0  ð1  mÞx  s  ms0  ð1  mÞs ¼ 0: By expanding the above equality and using the fact that tr (x⁄  s⁄) = 0, we obtain

ð4:26Þ

Since (x0, y0, s0) is as defined in (4.1), we have

trðx0  sÞ þ trðs0  xÞ ¼ ftrðx þ sÞ; trðx0  s0 Þ ¼ f2 trðeÞ; 0



0



qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ d2 6 ki ðv Þ 6 d þ 1 þ d2 ¼ qðdÞ:

The lower bound above can be written as

qffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ d2 ¼



1 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ; 1 þ d2 qðdÞ

which proves the lemma. h By substituting the results of Lemmas 4.6 and 4.7 into (4.25), f f with d :¼ d(v), we derive an upper bound for kdx k2F þ kds k2F as follows.



This implies that



d þ



m trðx0  sÞ þ trðs0  xÞ ¼ trðx  sÞ þ m2 trðx0  s0 Þ  ð1  mÞtrðx  s þ s  x Þ þ mð1  mÞtrðx0  s þ s0  x Þ:

qffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ d2 6 ki ðv Þ þ d:

Thus, we obtain the bounds for ki(v)

d þ

A x  mx0  ð1  mÞx ¼ 0; T

ki ðv Þ  d 6 jki ðv Þ  dj 6



trðx  s Þ þ trðs  x Þ ¼ ftrðx þ s Þ:

h i ð2frÞ2 qðdÞ2 f f kdx k2F þ kds k2F 6 2 4ð1  hÞ2 d2 þ h2 r þ 3h2 f2 h i ¼ 2 4ð1  hÞ2 d2 þ h2 r þ 12h2 r 2 qðdÞ2 ;

where r = tr (e) is the rank of the associated Euclidean Jordan algebra. 4.4. Choosing the update parameter h We want to choose h, 0 < h < 1, as large as possible, and such that (xf, yf, sf) lies in the quadratic convergence neighborhood with respect to the l+-center pffiffiffi of the perturbed problems ðCPmþ Þ and ðCPmþ Þ, i.e., dðv f Þ 6 1= 2. By (4.14), we derive that this is the case when 1 4

 f 2 f 2 2 kdx kF þkds kF 1h 2

Due to (4.2) we have tr (x⁄ + s⁄) 6 ftr (e). Furthermore, tr (x  s) = ltr (e) = mf2tr (e). Finally, tr (x  s⁄ + s  x⁄) P 0. Substitution of the above equations into (4.26) gives

mftrðx þ sÞ 6 mf2 trðeÞ þ m2 f2 trðeÞ þ mð1  mÞf2 trðeÞ ¼ 2mf2 trðeÞ; which implies the lemma. h

ð4:27Þ

1  12

2

kdfx kF þkdfs kF

6 2:

1h f

f

kd k2 þkd k2

s F F Considering x 1h as a single term, and performing some elementary calculations, we obtain that

f

f

pffiffiffi kdx k2F þ kds k2F 6 2 3  2 1:4641: 1h

ð4:28Þ

By (4.27), the above inequality holds if Lemma 4.7 (cf. Roos et al., 2006, Lemma II.62). If d :¼ d(v) is defined by (3.13), then

1 6 kmin ðv Þ 6 kmax ðv Þ 6 qðdÞ; qðdÞ where qðdÞ :¼ d þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ d2 .

h i pffiffiffi 2 4ð1  hÞ2 d2 þ h2 r þ 12h2 r 2 qðdÞ2 6 ð2 3  2Þð1  hÞ: Choosing s = 1/16, one may easily verify that if (Noting the fact that the left hand side of the above inequality is increasing in d, we may tighten the above inequality by substituting d with its upper bound s = 1/16),

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484



1 ; 4r

ð4:29Þ

then the above inequality is satisfied. Moreover, using (4.13) and (4.28) we have

       pffiffiffi 1   f  f 2  f 2 f  kðdx  ds Þ 6 dx  þ ds  6 ð 3  1Þð1  hÞ < 1  h; 2 1 F F which, by Lemma 4.2, means that (xf, yf, sf) are strictly feasible. Thus, we have found a desired update parameter h. 4.5. Iteration bound We have found that if at the start of an iteration the iterate satisfies d(x, s; l) 6 s, with s = 1/16, then after the feasibility step, with h as defined in p (4.29), the iterate is strictly feasible and satisfies ffiffiffi dðxf ; sf ; lþ Þ 6 1= 2, i.e., (xf, yf, sf) lies in the quadratic convergence neighborhood with respect to the l+-center of the perturbed problems ðCPmþ Þ and ðCPmþ Þ. After the feasibility step we perform a few centering steps in order to get iterate (x+, y+, s+) which satisfies x+Ts+ = nl+ and d(x+, s+; l+) 6 s. By Corollary 3.7, after k centering steps we will have the iterate (x+, y+, s+) that is still feasible for ðCPmþ Þ and ðCPmþ Þ and such that

 pffiffiffi2k dðxþ ; sþ ; lþ Þ 6 1= 2 : From this one easily deduces that d(x+, s+; l+) 6 s will hold after at most

      1 1 ¼ 1 þ log2 log2 log2 log2 2

s

s

ð4:30Þ

centering steps. According to (4.30), and since s = 1/16, at most three centering steps then suffice to get iterate (x+, y+, s+) that satisfies d(x+, s+; l+) 6 s again. So each main iteration consists of at most four so-called inner iterations. In each main iteration both the duality gap and the norms of the residual vectors are reduced by the factor (1  h). Hence, using tr (x0  s0) = rf2, the total number of main iterations is bounded above by

1 log h

  n o   max rf2 ; r 0p  ; kr0d kF F

e

:

Due to (4.29) and the fact that we need at most four inner iterations per main iteration, we may state without further proof the main result of this paper. Theorem 4.8. If (CP) has an optimal solution x⁄ and (CD) has an optimal solution (y⁄, s⁄), which satisfy tr (x⁄  s⁄) = 0 and x þ s K fe for some f > 0, then after at most

16r log

    o n   max rf2 ; r 0p  ; r 0d F F

e

inner iterations the Algorithm 4.1 finds an e-optimal solution of (CP) and (CD). Here r = tr (e) is the rank of the associated Euclidean Jordan algebra. As for LO, the iteration bound in Theorem 4.8 is derived under the assumption that there exists an optimal solution pair (x⁄, y⁄, s⁄) of (CP) and (CD) with vanishing duality gap and satisfying x þ s K fe. During the course of the algorithm, if at some main iteration, the proximity measure d after the feasibility step exceeds pffiffiffi 1= 2, then it tells us that the above assumption does not hold. It

483

may happen that the value of f has been chosen too small. In this case one might run the algorithm once more with a larger value of f.

5. Concluding remarks Using Jordan algebras, we generalize the full-Newton step IIPM for LO of Roos (2006) to symmetric optimization, which unifies the analysis for LO, SOCO and SDO. The underline method can also be viewed as a homotopy method, which turns out to have many good properties. First, as the name suggests, it uses full steps (instead of damped steps), so no need to calculate the step length. Second, the iterates always lie in the quadratic convergence neighborhood with respect to some perturbed problems. Third, during the solution process, both ‘‘feasibility’’ and ‘‘optimality’’ are improved at the same rate, which is also credited by Potra (1996). Finally, the order of the iteration bound coincides with the bound derived for LO, which is the currently best known iteration bound for symmetric optimization. In this paper, we use NT-direction. A natural generalization is to use directions in the commutative class, or more generally the directions in MZ-family (maybe with different proximity measures). Another topic for further research is to consider largeupdate variants of the algorithm, since such methods are much more efficient in practice. Finally, a question that might be considered is whether full step methods can be made efficient by using dynamic updates of the barrier parameter. This will not improve the theoretical complexity, but it will enhance the practical performance of the algorithm significantly. Acknowledgements The first author Supported by NSFC (Grant No. 11001124), the second author wish to thank Shahrekord University for financial support. References Ben-Tal, A., Nemirovski, A., 2001. Lectures on modern convex optimization. MPS/ SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, Analysis, Algorithms, and Engineering Applications. de Klerk, E., 2002. Aspects of semidefinite programming. Applied Optimization. Interior Point Algorithms and Selected Applications, vol. 65. Kluwer Academic Publishers, Dordrecht. Faraut, J., Korányi, A., 1994. Analysis on symmetric cones. Oxford Mathematical Monographs. Oxford Science Publications/The Clarendon Press Oxford University Press, New York. Faybusovich, L., 1997a. Euclidean Jordan algebras and interior-point algorithms. Positivity 1 (4), 331–357. Faybusovich, L., 1997b. Linear systems in Jordan algebras and primal-dual interiorpoint algorithms. Journal of Computational and Applied Mathematics 86 (1), 149–175. special issue dedicated to William B. Gragg (Monterey, CA, 1996). Faybusovich, L., 2002. A Jordan-algebraic approach to potential-reduction algorithms. Mathematische Zeitschrift 239 (1), 117–129. Güler, O., 1996. Barrier functions in interior point methods. Mathematics of Operations Research 21 (4), 860–885. Kojima, M., Megiddo, N., Mizuno, S., 1993. A primal-dual infeasible-interior-point algorithm for linear programming. Mathematical Programming 61 (3, Ser. A), 263–280. Lustig, I.J., 1990/91. Feasibility issues in a primal-dual interior-point method for linear programming. Mathematical Programming 49 (2, Ser. A), 145–162. Mizuno, S., 1994. Polynomiality of infeasible-interior-point algorithms for linear programming. Mathematical Programming 67 (1, Ser. A), 109–119. Nesterov, Y., Nemirovskii, A., 1994. Interior-point polynomial algorithms in convex programming. SIAM Studies in Applied Mathematics. Society for Industrial and Applied Mathematics, vol. 13. SIAM, Philadelphia, PA. Nesterov, Y.E., Todd, M.J., 1997. Self-scaled barriers and interior-point methods for convex programming. Mathematics of Operations Research 22 (1), 1–42. Nesterov, Y.E., Todd, M.J., 1998. Primal-dual interior-point methods for self-scaled cones. SIAM Journal on Optimization 8 (2), 324–364. electronic. Potra, F.A., 1996. An infeasible-interior-point predictor–corrector algorithm for linear programming. SIAM Journal on Optimization 6 (1), 19–32.

484

G. Gu et al. / European Journal of Operational Research 214 (2011) 473–484

Rangarajan, B.K., 2006. Polynomial convergence of infeasible-interior-point methods over symmetric cones. SIAM Journal on Optimization 16 (4), 1211– 1229. electronic. Roos, C., 2006. A full-Newton step O(n) infeasible interior-point algorithm for linear optimization. SIAM Journal on Optimization 16 (4), 1110–1136. Roos, C., Terlaky, T., Vial, J.-P., 2006. Interior point methods for linear optimization, second ed. Theory and algorithms for linear optimization Springer, New York [Wiley, Chichester, 1997; MR1450094]. Schmieta, S.H., Alizadeh, F., 2001. Associative and Jordan algebras, and polynomial time interior-point algorithms for symmetric cones. Mathematics of Operations Research 26 (3), 543–564.

Schmieta, S.H., Alizadeh, F., 2003. Extension of primal-dual interior point algorithms to symmetric cones. Mathematical Programming 96 (3, Ser. A), 409–438. Sturm, J.F., 2000. Similarity and other spectral relations for symmetric cones. Linear Algebra and its Applications 312 (1–3), 135–154. Vieira, M.V.C., 2007. Jordan algebraic approach to symmetric optimization. Ph.D. thesis, Delft University of Technology. Zhang, Y., 1994. On the convergence of a class of infeasible interior-point methods for the horizontal linear complementarity problem. SIAM Journal on Optimization 4 (1), 208–227.