A note on the convergence of alternating proximal gradient method

Applied Mathematics and Computation 228 (2014) 258–263 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homepag...

Download PDF

314KB Sizes 13 Downloads 175 Views

Report

PDF Reader
Full Text

Applied Mathematics and Computation 228 (2014) 258–263

Contents lists available at ScienceDirect

Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

A note on the convergence of alternating proximal gradient method Miantao Chao ⇑, Caozong Cheng Department of Mathematics, Beijing University of Technology, 100124 Beijing, China

a r t i c l e

i n f o

Keywords: Alternating proximal gradient method Alternating direction method of multipliers Strongly convex functions Global convergence

a b s t r a c t We consider a class of linearly constrained separable convex programming problems whose objective functions are the sum of m convex functions without coupled variables. The alternating proximal gradient method is an effective method for the case m ¼ 2, but it is unknown whether its convergence can be extended to the general case m P 3. This note shows the global convergence of this extension when the involved functions are strongly convex. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction We consider the following convex minimization model with linear constraints and separable objective function:

min

m X fi ðxi Þ

s:t:

m X Ai xi ¼ b;

i¼1

ð1Þ

i¼1

xi 2 X i ;

i ¼ 1; . . . ; m;

where fi : Rni ! R ði ¼ 1; . . . ; mÞ are closed proper functions, X i # Rni ði ¼ 1; . . . ; mÞ are closed convex sets, Ai 2 Rlni ði ¼ 1; . . . ; mÞ are given matrices, and b 2 Rl is a given vector. Throughout, the solution set of (1) is assumed to be nonempty. Our discussion focuses on the particular case of (1) where m P 3. A fundamental method for solving (1) is the alternating direction method of multipliers (ADMM) which was presented originally in [1,2]. The standard ADMM iterative scheme is

8 8 2 9 = m < > X > > > xkþ1 2 arg min f1 ðx1 Þ þ 21l A1 x1 þ Aj xkj b lkk ; > 1 > x1 2X 1 : > ; > j¼2 > > > 8 > 2 9 > > = < X i1 m > X > k > kþ1 kþ1 1 k > xi 2 arg min fi ðxi Þ þ 2l Aj xj þ Ai xi þ A x b l k ; > j j < xi 2X i : ; j¼1 j¼iþ1 8 9 > 2 > > = m 1 < X > > k kþ1 kþ1 1 > > x 2 arg min f ðx Þ þ A x þ A x b l k ; m m m m j j m > 2 l > xm 2X m : ; j¼1 > > > ! > > m > X > kþ1 > > ¼ kk l1 Aj xkþ1 b : >k j : j¼1

⇑ Corresponding author. E-mail addresses: [email protected] (M. Chao), [email protected] (C. Cheng). 0096-3003/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2013.11.101

ð2Þ

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

259

The ADMM has been well studied in the literature for the special case m ¼ 2. Without further assumptions, the convergence of (2) for the general case m P 3 still remains open despite that its efﬁciency has been veriﬁed empirically in [7,8]. In [3], the convergence of (2) is shown under the conditions that all fi are strongly convex. When each convex function fi in (1) is of particular structure and the step size l1 in the update of kkþ1 is sufﬁciently small to fulﬁll certain error bound, the resulting scheme is convergent (see [5]). He et al. [4] showed that the resulting sequence is convergent if the output of (2) is further corrected by a substitution procedure. Based on the framework of ADMM (2), Ma [6] proposed an alternating proximal gradient method (APGM) for problem (1) in the special case m ¼ 2, as follows:

8 h i2 > T k kþ1 1 k k k > x ¼ arg min f ðx Þ þ x x s A ðA x þ A x b l k Þ > ; 1 1 1 1 1 2 1 1 2 1 > 1 2ls1 > x1 2X 1 > < h i2 T k kþ1 kþ1 1 k k ¼ arg min f ðx Þ þ x x s A ðA x þ A x b l k Þ x ; 2 2 2 2 1 2 > 2 2 2 1 2ls2 > x2 2X 2 > 2 > > > : kkþ1 ¼ kk 1 A xkþ1 þ A xkþ1 b ; l

1 1

ð3Þ

2 2

where s1 and s2 are the step size for the proximal gradient steps. It is easy to see that the subproblems in APGM (3) are easier to solve than the subproblems in ADMM (2). A natural idea for solving (1) is to extend the APGM (3) from the special case m ¼ 2 to the general case m P 3. This yields the following scheme for (1) when m P 3:

8 8 " !#2 9 = < m > X > > T k kþ1 1 k k > x1 ¼ arg min f1 ðx1 Þ þ 2ls1 x1 x1 s1 A1 A j xj b l k ; > > x 2X > ; : 1 1 > j¼1 > > > > > 8 > > " !#2 9 > = i1 m > < X X > > kþ1 T k kþ1 1 k k > ¼ arg min f ðx Þ þ x x s A A x þ A x b l k x > ; i i i i j j i i j j > 2lsi < i xi 2X i : ; j¼1 j¼i > 8 > " !#2 9 > > = m1 < > X > > T k kþ1 kþ1 1 k k > ¼ arg min fm ðxm Þ þ 2ls x x s A A x þ A x b l k xm ; > m m m j m m m j > m xm 2X m : > ; > j¼1 > > > ! > > m > X > kþ1 > > ¼ kk l1 Aj xkþ1 b ; >k j :

ð4Þ

j¼1

where l > 0 and si > 0, i ¼ 1; 2; . . . ; m. Similar to the ADMM, we want to investigate whether the APGM’s convergence can be extended to the general case, i.e., the case m P 3. In this paper, we show it’s global convergent for the general case under the strong convexity assumption of fi (i ¼ 1; 2; . . . ; m). 2. Preliminaries Let k k denote the Euclidean norm. For any positive deﬁnite matrix M, we denote k kM as the M-norm. If M is the product of a positive parameter b and the identity matrix I, i.e., M ¼ bI, we use the simpler notation: k kM ¼ k kb . Let f : Rn ! ð1; þ1. If the domain of f denoted by domf :¼ fx 2 Rn jf ðxÞ < þ1g is not empty, f is said to be proper. We say that f is convex if

f ðtx þ ð1 tÞyÞ 6 tf ðxÞ þ ð1 tÞf ðyÞ; 8x; y 2 Rn ;

8t 2 ½0; 1:

Furthermore, f is said to be strongly convex with the modulus g > 0 iff

1 f ðtx þ ð1 tÞyÞ 6 tf ðxÞ þ ð1 tÞf ðyÞ gtð1 tÞkx yk2 ; 2

8x; y 2 Rn ; 8t 2 ½0; 1:

For convex function f, the subdifferential of f is the set-valued operator deﬁned by

@f ðxÞ :¼ fn 2 Rn jf ðyÞ P f ðxÞ þ hy x; ni;

8y 2 dom f g:

If proper function f is strongly convex with modulus g > 0, then

hs1 s2 ; x1 x2 i P

1 gkx1 x2 k2 ; 2

8s1 2 @f ðx1 Þ; 8s2 2 @f ðx2 Þ:

The Lagrange function of (1) is given by

! m m X X T Lðx1 ; x2 ; . . . ; xm ; kÞ ¼ fi ðxÞ k Ai xi b ; i¼1

i¼1

ð5Þ

260

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

and it is deﬁned on the set

W :¼ X 1 X 2 . . . X m Rl : Let ðx1 ; x2 ; . . . ; xm ; k Þ be a saddle point of the Lagrange function (5). Then, for any k 2 Rl and xi 2 X i (i ¼ 1; 2; . . . ; m), we have

Lðx1 ; x2 ; . . . ; xm ; kÞ 6 Lðx1 ; x2 ; . . . ; xm ; k Þ 6 Lðx1 ; x2 ; . . . ; xm ; k Þ: Finding a saddle point of Lðx1 ; x2 ; . . . ; xm ; kÞ is equivalent to ﬁnding w ¼ ðx1 ; x2 ; . . . ; xm ; k Þ 2 W and ni 2 @fi ðxi Þ (i ¼ 1; 2; . . . ; m) such that

hxi xi ; ni ATi k i P 0; and hk k ;

m X Ai xi bi P 0; 8ðx1 ; x2 ; . . . ; xm ; kÞ 2 W:

ð6Þ

i¼1

b and H Finally, we deﬁne two auxiliary block-diagonal matrices H

b ¼ diag H H ¼ diag

1

Iþ

ls1

1

ls1

Iþ

m

l m

l

AT1 A1 ; AT1 A1 ;

1

ls2 1

ls2

Iþ Iþ

m

l m

l

AT2 A2 ; . . . ; AT2 A2 ; . . . ;

1

Iþ

lsm 1

Iþ

lsm

where I denotes the identity matrix in Rll . Since matrices.

m

l m

l

ATm Am ;

ATm Am ; lI ;

b and H are positive deﬁnite l > 0 and si > 0 ði ¼ 1; 2; . . . ; mÞ, both H

3. Convergence analysis In this section, by solving (1), we prove the convergence of the extended APGM (4) with the strongly convex assumption on fi ’s. Assume that fi is strongly convex with the the modulus gi > 0 (i ¼ 1; 2; . . . ; m). Let X denotes the solution set of (1), W denote the solution set of (6), and sequence fwk g be generated by (4), where wk ¼ ðxk1 ; xk2 ; . . . ; xkm ; kk Þ. We ﬁrst prove the following lemma. Lemma 1. Let ðx1 ; x2 ; . . . ; xm Þ 2 X and k be a corresponding Lagrange multiplier associated with the linear constraint. Then

* kk k ;

2 + m m m m X X

1 1X 1 kþ1 X kþ1 2 kþ1 k Ai xikþ1 b P gi xkþ1 x þ x x ; x x A x b þ i i i i i i i i 2 ls l i i¼1 i¼1 i¼1 i¼1 * + m m X 1X þ Ai xkþ1 Ai xi ; Aj xkj Aj xjkþ1 : i

l i¼1

ð7Þ

j¼i

Proof. By invoking the ﬁrst-order optimality condition for the xkþ1 -related subproblem in (4), there exists nkþ1 2 @fi ðxikþ1 Þ i i such that

* xi

xkþ1 ; nkþ1 i i

þ

1

lsi

"

xkþ1 i

xki

þs

T i Ai

i1 m X X Aj xkþ1 þ Aj xkj b lkk j j¼1

xi

xkþ1 ; nkþ1 i i

þ

1

lsi

xkþ1 i

xki

"

ATi

P 0;

i1 m X 1 X k Aj xkþ1 þ Aj xkj b j

ð8Þ

!#+

k

l

8 xi 2 X i :

j¼i

Taking xi :¼ xi and xi :¼ xikþ1 in (8) and (6), respectively, we obtain that

*

!#+

j¼1

P 0;

ð9Þ

j¼i

D E xkþ1 xi ; ni ATi k P 0: i

ð10Þ

Adding (9) and (10), we get

* xkþ1 i

xi ; ðni

nkþ1 Þ i

ATi ðk

k

k Þ

1

lsi

ðxikþ1

xki Þ

1

l

ATi

!+ i1 m X X kþ1 k P 0: A j xj þ A j xj b j¼1

j¼i

According to the strong convexity of fi , it follows that

D E D E

1 1 kþ1 Ai ðxkþ1 xi Þ; kk k P xkþ1 xi ; nkþ1 ni þ xi xi ; xkþ1 xki þ i i i i

lsi

P

1 1 1 kþ1 g kxkþ1 xi k2 þ x xi ; xkþ1 xki þ i 2 i i lsi i l

*

l

*

Ai ðxikþ1

xi Þ;

+ i1 m X X kþ1 k : Ai ðxkþ1 x Þ; A x þ ðA x bÞ j j j j i i j¼1

j¼i

+ i1 m X X kþ1 k A j xj þ Aj xj bÞ j¼1

j¼i

261

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

Summing the above inequality over all i ¼ 1; 2; . . . ; m and using the equality

Pm

i¼1 Ai xi

¼ b, we have

* + m D m E X X k k kþ1 kþ1 Ai ðxi xi Þ; k k ¼ Ai xi b; k k i¼1

i¼1

2 m m m X

1 1X 1 kþ1 X kþ1 2 kþ1 k P gi xkþ1 x þ x x ; x x A x b þ i i i i i i i i 2 i¼1 ls l i i¼1 i¼1 * + m m X 1X þ Ai xkþ1 xi ; Aj xkj Aj xkþ1 : i j

l i¼1

j¼i

The proof is completed. h Next, we ﬁnd an upper bound of the quantity kwkþ1 w kH kwk w kH , which measures the progress made by the new iterate wkþ1 . With the help of Lemma 1, we can get the lemma below. Lemma 2. Let w ¼ ðx1 ; x2 ; . . . ; xm ; k Þ 2 W . Then

2 m m X kþ1 kþ1 1 X kþ1 2 w w 2 6 kwk w k2 c x x A x b ; i i i H i i H l i¼1 i¼1

ð11Þ

2 where ci ¼ gi 3mþ1 l kAi k .

Proof. It follows from the last equality in (4) and (7) that

kk

kþ1

2 !2 * + 2 m m m X 1 X 1 X kþ1 k k k kþ1 kþ1 k kl ¼ k k Ai xi b ¼ k k 2 k k ; A i xi b þ A i x i b l l i¼1 l i¼1 i¼1 2

l

2 2 X m m m X

1 1 kþ1 X kþ1 k kþ1 2 kþ1 k 6 k k gi kxi xi k 2 x i x i ; xi x i A i x i b l ls l i i¼1 i¼1 i¼1 * + m m X 2X Ai xkþ1 Ai xi ; ðAj xkj Aj xjkþ1 Þ : i

l i¼1

ð12Þ

j¼i

On the other hand, for any i,

2

lsi

2 kþ1 2 kþ1 xikþ1 xi ; xkþ1 xki ¼ xi xi ; xikþ1 xi hxi xi ; xi xki i i

lsi

6

2

lsi

lsi

kxkþ1 xi k2 þ i

2

lsi

jhxkþ1 xi ; xi xki ij 6 i

1

lsi

kxkþ1 xi k2 þ i

1

lsi

kxki xi k2

ð13Þ

and

* 2

Ai xikþ1

Ai xi ;

+ * + m m X X k kþ1 kþ1 k ðAj xj Aj xj Þ ¼ 2 Ai ðxi xi Þ; ðAj xj Aj xj Þ j¼i

*

j¼i

+ m X 2 Ai ðxikþ1 xi Þ; ðAj xj Aj xjkþ1 Þ j¼i m D m D E E X X

xi Þ; Aj ðxj xjkþ1 Þ 6 2 Ai ðxikþ1 xi Þ; Aj ðxkj xj Þ þ 2 Ai ðxkþ1 i j¼i

j¼i

m m X X 6 2ðm i þ 1ÞkAi ðxikþ1 xi Þk2 þ kAj ðxjkþ1 xj Þk2 þ kAj ðxkj xj Þk2 : j¼i

ð14Þ

j¼i

Summing up (13) for all i’s, we obtain

2

m X 1 i¼1

lsi

hxkþ1 xi ; xikþ1 xki i 6 i

m X 1 i¼1

lsi

kxkþ1 xi k2 þ i

m X 1 i¼1

lsi

kxki xi k2 :

ð15Þ

262

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

Since m X m m X X kAj xjkþ1 Aj xj k2 ¼ ikAi xkþ1 Ai xi k2 i i¼1 j¼i

i¼1

and m X m m X X kAj xkj Aj xj k2 ¼ ikAi xki Ai xi k2 ; i¼1 j¼i

i¼1

summing up (14) for all i’s, we obtain

* + m m m m m X X X X X kþ1 k kþ1 2 A i xi A i xi ; ðAj xj Aj xj Þ 6 2 ðm i þ 1ÞkAi ðxkþ1 xi Þk2 þ ikAi ðxikþ1 xi Þk2 þ ikAi ðxki xi Þk2 i i¼1

j¼i

i¼1

i¼1

i¼1

m m m m X X X X ¼ ð2m i þ 2ÞkAi ðxikþ1 xi Þk2 þ ikAi ðxki xi Þk2 6 ð2m þ 1ÞkAi ðxkþ1 xi Þk2 þ m kAi ðxki xi Þk2 : i i¼1

i¼1

i¼1

ð16Þ

i¼1

By substituting (15) and (16) into (12), we obtain

kkkþ1 k k2l þ

m m m m X X mX 1 mX 1 kAi ðxikþ1 xi Þk2 þ kxikþ1 xi k2 6 kkk k k2l þ kAi ðxki xi Þk2 þ kxki xi k2

l

i¼1

i¼1

lsi

l

i¼1

i¼1

lsi

2 m m m m X 1 3m þ 1 X mX X kþ1 kþ1 2 gi kxi xi k Ai xi b þ kAi ðxkþ1 xi Þk2 6 kkk k k2l þ kA ðxk xi Þk2 i l i¼1 l i¼1 l i¼1 i i i¼1 2 m m m m X X 1 1 3m þ 1 X X kþ1 k 2 kþ1 2 þ kxi xi k gi kxi xi k Ai xi b þ kA k2 kxkþ1 xi k2 ¼ kkk k k2l i lsi l i¼1 l i¼1 i i¼1 i¼1 2 m m m m X X mX 1 3m þ 1 1 X þ kAi ðxki xi Þk2 þ kxki xi k2 gi kAi k2 kxikþ1 xi k2 Ai xkþ1 b : i l i¼1 ls l l i i¼1 i¼1 i¼1 where the second inequality follows from kAi ðxikþ1 xi Þk2 6 kAi k2 kxikþ1 xi k2 .

h

With the help of the preceding two lemmas, we can establish the convergence of the extended APGM (4) for solving (1) with strongly convex fi ’s. Theorem 3. Let fi ’s in (1) be strongly convex with the modulus gi ’s. For any

( ) ð3m þ 1ÞkAi k2 ; l > max

gi

16i6m

the sequence fwk g generated by the APGM (4) converges to a solution of (6). Proof. Since

2

2 ik l > max fð3mþ1ÞkA g, ci :¼ gi 3mþ1 gi l kAi k > 0. It following from (11) that 16i6m

kwkþ1 w k2H 6 kwk w k2H 6 6 kw0 w k2H < þ1:

ð17Þ

Taking (11) into account, we have

2 2 3 X 1 m m 1 X X X 2 2 kþ1 kþ1 4 ci kx x k þ Ai x b ðkwk w kH kwkþ1 w kH Þ < þ1: 56 i i i l i¼1 k¼0 i¼1 k¼0 Thus

lim kxikþ1 xi k ¼ 0;

k!1

for all i ¼ 1; . . . ; m;

X m kþ1 lim Ai xi b ¼ 0: k!1 i¼1

ð18Þ

ð19Þ

Taking (17) and (18) into account, it yields that the sequence fkkk k k2l g converges. Without loss of generality, we take the k k and limkj !1 ni j ¼ ni in view of the boundness of fkk g and fnki g, where subsequences fkkj g and fnkj g such that limkj !1 kkj ¼ nki 2 @fi ðxki Þ. Taking limit along this subsequence in (8) and (19), we obtain

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

ni 2 @fi ðxi Þ; hxi 0 xi ; ni AT ki P 0;

8xi 0 2 X i and

263

m X Ai xi ¼ b; i¼1

which mean that ðx1 ; x2 ; . . . ; xm ; kÞ is a saddle point of the Lagrange function (5). Since k is an arbitrary Lagrange multiplier in (17) and conclude that the whole generated sequence fkk g converges to corresponding to ðx1 ; x2 ; . . . ; xm Þ , we can set k ¼ k This completes the proof. h k.

4. Conclusion We consider the linearly constrained separable convex programming, whose objective function is separable into m ðP 3Þ individual convex functions without coupled variables. The alternating proximal gradient method is an effective method for the case m ¼ 2, but it is unknown whether its convergence can be extended to the general case m P 3. In this paper, we show the global convergence of the alternating proximal gradient method for minimizing the sum of any number of strongly convex separable functions. Acknowledgments This work was supported by the National Science Foundation of China (Grant No. 61179033), and Doctoral Fund of Innovation of Beijing University of Technology. References [1] D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via ﬁnite element approximations, Comput. Math. Appl. 2 (1976) 17–40. [2] D. Gabay, Applications of the method of multipliers to variational inequalities, in: M. Fortin, R. Glowinski (Eds.), Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, North-Holland, Amsterdam, 1983, pp. 299–331. [3] D.R. Han, X.M. Yuan, A Note on the Alternating Direction Method of Multipliers, J. Optim. Theory Appl. 155 (2012) 227–238. [4] B.S. He, M. Tao, X.M. Yuan, Alternating direction method with Gaussian back substitution for separable convex programming, SIAM J. Optim. 22 (2012) 313–340. [5] M.Y. Hong, Z.Q. Luo, On the linear convergence of the alternating direction method of multipliers, Arxiv preprint arxiv:1208.3922 (2012). [6] S.Q. Ma, Alternating Proximal Gradient Method for Convex Minimization, Preprint (2012). [7] Y.G. Peng, A. Ganesh, J. Wright, W.L. Xu, Y. Ma, RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images, IEEE Trans. Pattern Anal. Mach. Intell. 34 (11) (2012) 2233–2246. [8] M. Tao, X.M. Yuan, Recovering low-rank and sparse components of matrices from incomplete and noisy observations, SIAM J. Optim. 21 (2011) 57–81.

A note on the convergence of alternating proximal gradient method

A note on the convergence of alternating proximal gradient method

Recommend Documents