A note on the convergence of alternating proximal gradient method

A note on the convergence of alternating proximal gradient method

Applied Mathematics and Computation 228 (2014) 258–263 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homepag...

314KB Sizes 13 Downloads 175 Views

Applied Mathematics and Computation 228 (2014) 258–263

Contents lists available at ScienceDirect

Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

A note on the convergence of alternating proximal gradient method Miantao Chao ⇑, Caozong Cheng Department of Mathematics, Beijing University of Technology, 100124 Beijing, China

a r t i c l e

i n f o

Keywords: Alternating proximal gradient method Alternating direction method of multipliers Strongly convex functions Global convergence

a b s t r a c t We consider a class of linearly constrained separable convex programming problems whose objective functions are the sum of m convex functions without coupled variables. The alternating proximal gradient method is an effective method for the case m ¼ 2, but it is unknown whether its convergence can be extended to the general case m P 3. This note shows the global convergence of this extension when the involved functions are strongly convex. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction We consider the following convex minimization model with linear constraints and separable objective function:

min

m X fi ðxi Þ

s:t:

m X Ai xi ¼ b;

i¼1

ð1Þ

i¼1

xi 2 X i ;

i ¼ 1; . . . ; m;

where fi : Rni ! R ði ¼ 1; . . . ; mÞ are closed proper functions, X i # Rni ði ¼ 1; . . . ; mÞ are closed convex sets, Ai 2 Rlni ði ¼ 1; . . . ; mÞ are given matrices, and b 2 Rl is a given vector. Throughout, the solution set of (1) is assumed to be nonempty. Our discussion focuses on the particular case of (1) where m P 3. A fundamental method for solving (1) is the alternating direction method of multipliers (ADMM) which was presented originally in [1,2]. The standard ADMM iterative scheme is

8 8 2 9   =  m < > X >   > > xkþ1 2 arg min f1 ðx1 Þ þ 21l A1 x1 þ Aj xkj  b  lkk  ; > 1 > x1 2X 1 : >  ;  > j¼2 > > >  8 > 2 9  > >  = < X i1 m > X >   k > kþ1 kþ1 1 k > xi 2 arg min fi ðxi Þ þ 2l  Aj xj þ Ai xi þ A x  b  l k  ; > j j < xi 2X i :  ;  j¼1 j¼iþ1    8 9 > 2  > >  = m 1 < X > > k kþ1 kþ1 1  > > x 2 arg min f ðx Þ þ A x þ A x  b  l k  ;  m m m m j j m > 2 l > xm 2X m :  ;  j¼1 > > > ! > > m > X > kþ1 > > ¼ kk  l1 Aj xkþ1 b : >k j : j¼1

⇑ Corresponding author. E-mail addresses: [email protected] (M. Chao), [email protected] (C. Cheng). 0096-3003/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2013.11.101

ð2Þ

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

259

The ADMM has been well studied in the literature for the special case m ¼ 2. Without further assumptions, the convergence of (2) for the general case m P 3 still remains open despite that its efficiency has been verified empirically in [7,8]. In [3], the convergence of (2) is shown under the conditions that all fi are strongly convex. When each convex function fi in (1) is of particular structure and the step size l1 in the update of kkþ1 is sufficiently small to fulfill certain error bound, the resulting scheme is convergent (see [5]). He et al. [4] showed that the resulting sequence is convergent if the output of (2) is further corrected by a substitution procedure. Based on the framework of ADMM (2), Ma [6] proposed an alternating proximal gradient method (APGM) for problem (1) in the special case m ¼ 2, as follows:

8   h i2  > T k  kþ1 1  k k k > x ¼ arg min f ðx Þ þ x  x  s A ðA x þ A x  b  l k Þ >   ; 1 1 1 1 1 2 1 1 2 1 > 1 2ls1 > x1 2X 1 > <   h i2  T k  kþ1 kþ1 1  k k ¼ arg min f ðx Þ þ x  x  s A ðA x þ A x  b  l k Þ x  ;  2 2 2 2 1 2 > 2 2 2 1 2ls2 > x2 2X 2 > 2 > >   > : kkþ1 ¼ kk  1 A xkþ1 þ A xkþ1  b ; l

1 1

ð3Þ

2 2

where s1 and s2 are the step size for the proximal gradient steps. It is easy to see that the subproblems in APGM (3) are easier to solve than the subproblems in ADMM (2). A natural idea for solving (1) is to extend the APGM (3) from the special case m ¼ 2 to the general case m P 3. This yields the following scheme for (1) when m P 3:

8 8  " !#2 9  = < m  > X >   > T k kþ1 1 k k > x1 ¼ arg min f1 ðx1 Þ þ 2ls1 x1  x1  s1 A1 A j xj  b  l k  ; > > x 2X >  ;  : 1 1 > j¼1 > > > >  > 8 >  > " !#2 9 >  = i1 m  > < X X > > kþ1  T k kþ1 1  k k > ¼ arg min f ðx Þ þ x  x  s A A x þ A x  b  l k x >  ;  i i i i j j i i j j > 2lsi  < i xi 2X i :  ; j¼1 j¼i  > 8 >  " !#2 9 > >  = m1  < > X >  > T k kþ1 kþ1 1  k k > ¼ arg min fm ðxm Þ þ 2ls x  x  s A A x þ A x  b  l k xm  ;  > m m m j m m m j > m  xm 2X m : >  ; > j¼1 > > > ! > > m > X > kþ1 > > ¼ kk  l1 Aj xkþ1 b ; >k j :

ð4Þ

j¼1

where l > 0 and si > 0, i ¼ 1; 2; . . . ; m. Similar to the ADMM, we want to investigate whether the APGM’s convergence can be extended to the general case, i.e., the case m P 3. In this paper, we show it’s global convergent for the general case under the strong convexity assumption of fi (i ¼ 1; 2; . . . ; m). 2. Preliminaries Let k  k denote the Euclidean norm. For any positive definite matrix M, we denote k  kM as the M-norm. If M is the product of a positive parameter b and the identity matrix I, i.e., M ¼ bI, we use the simpler notation: k  kM ¼ k  kb . Let f : Rn ! ð1; þ1. If the domain of f denoted by domf :¼ fx 2 Rn jf ðxÞ < þ1g is not empty, f is said to be proper. We say that f is convex if

f ðtx þ ð1  tÞyÞ 6 tf ðxÞ þ ð1  tÞf ðyÞ; 8x; y 2 Rn ;

8t 2 ½0; 1:

Furthermore, f is said to be strongly convex with the modulus g > 0 iff

1 f ðtx þ ð1  tÞyÞ 6 tf ðxÞ þ ð1  tÞf ðyÞ  gtð1  tÞkx  yk2 ; 2

8x; y 2 Rn ; 8t 2 ½0; 1:

For convex function f, the subdifferential of f is the set-valued operator defined by

@f ðxÞ :¼ fn 2 Rn jf ðyÞ P f ðxÞ þ hy  x; ni;

8y 2 dom f g:

If proper function f is strongly convex with modulus g > 0, then

hs1  s2 ; x1  x2 i P

1 gkx1  x2 k2 ; 2

8s1 2 @f ðx1 Þ; 8s2 2 @f ðx2 Þ:

The Lagrange function of (1) is given by

! m m X X T Lðx1 ; x2 ; . . . ; xm ; kÞ ¼ fi ðxÞ  k Ai xi  b ; i¼1

i¼1

ð5Þ

260

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

and it is defined on the set

W :¼ X 1  X 2  . . .  X m  Rl : Let ðx1 ; x2 ; . . . ; xm ; k Þ be a saddle point of the Lagrange function (5). Then, for any k 2 Rl and xi 2 X i (i ¼ 1; 2; . . . ; m), we have

Lðx1 ; x2 ; . . . ; xm ; kÞ 6 Lðx1 ; x2 ; . . . ; xm ; k Þ 6 Lðx1 ; x2 ; . . . ; xm ; k Þ: Finding a saddle point of Lðx1 ; x2 ; . . . ; xm ; kÞ is equivalent to finding w ¼ ðx1 ; x2 ; . . . ; xm ; k Þ 2 W and ni 2 @fi ðxi Þ (i ¼ 1; 2; . . . ; m) such that

hxi  xi ; ni  ATi k i P 0; and hk  k ;

m X Ai xi  bi P 0; 8ðx1 ; x2 ; . . . ; xm ; kÞ 2 W:

ð6Þ

i¼1

b and H Finally, we define two auxiliary block-diagonal matrices H

b ¼ diag H H ¼ diag



1



ls1



1

ls1



m

l m

l

AT1 A1 ; AT1 A1 ;

1

ls2 1

ls2

Iþ Iþ

m

l m

l

AT2 A2 ; . . . ; AT2 A2 ; . . . ;

1



lsm 1



lsm

where I denotes the identity matrix in Rll . Since matrices.

m

l m

l

 ATm Am ;

 ATm Am ; lI ;

b and H are positive definite l > 0 and si > 0 ði ¼ 1; 2; . . . ; mÞ, both H

3. Convergence analysis In this section, by solving (1), we prove the convergence of the extended APGM (4) with the strongly convex assumption on fi ’s. Assume that fi is strongly convex with the the modulus gi > 0 (i ¼ 1; 2; . . . ; m). Let X  denotes the solution set of (1), W  denote the solution set of (6), and sequence fwk g be generated by (4), where wk ¼ ðxk1 ; xk2 ; . . . ; xkm ; kk Þ. We first prove the following lemma. Lemma 1. Let ðx1 ; x2 ; . . . ; xm Þ 2 X  and k be a corresponding Lagrange multiplier associated with the linear constraint. Then

* kk  k ;

2  +  m m m m X X  

1 1X 1 kþ1  X kþ1  2  kþ1 k Ai xikþ1  b P gi xkþ1  x þ x  x ; x  x A x  b þ   i i i i i i i i   2 ls l i i¼1 i¼1 i¼1 i¼1 * + m m X 1X þ Ai xkþ1  Ai xi ; Aj xkj  Aj xjkþ1 : i

l i¼1

ð7Þ

j¼i

Proof. By invoking the first-order optimality condition for the xkþ1 -related subproblem in (4), there exists nkþ1 2 @fi ðxikþ1 Þ i i such that

* xi 

xkþ1 ; nkþ1 i i

þ

1

lsi

" 

xkþ1 i



xki



þs

T i Ai

i1 m X X Aj xkþ1 þ Aj xkj  b  lkk j j¼1

xi



xkþ1 ; nkþ1 i i

þ

1 

lsi

xkþ1 i



xki



"



ATi

P 0;

i1 m X 1 X k  Aj xkþ1 þ Aj xkj  b j

ð8Þ

!#+

k

l

8 xi 2 X i :

j¼i

Taking xi :¼ xi and xi :¼ xikþ1 in (8) and (6), respectively, we obtain that

*

!#+

j¼1

P 0;

ð9Þ

j¼i

D E xkþ1  xi ; ni  ATi k P 0: i

ð10Þ

Adding (9) and (10), we get

* xkþ1 i



xi ; ðni



nkþ1 Þ i



ATi ðk

k

k Þ

1

lsi

ðxikþ1



xki Þ



1

l

ATi

!+ i1 m X X kþ1 k P 0: A j xj þ A j xj  b j¼1

j¼i

According to the strong convexity of fi , it follows that

D E D E

1 1 kþ1 Ai ðxkþ1  xi Þ; kk  k P xkþ1  xi ; nkþ1  ni þ xi  xi ; xkþ1  xki þ i i i i

lsi

P

1 1 1 kþ1 g kxkþ1  xi k2 þ x  xi ; xkþ1  xki þ i 2 i i lsi i l

*

l

*

Ai ðxikþ1



xi Þ;

+ i1 m X X  kþ1 k : Ai ðxkþ1  x Þ; A x þ ðA x  bÞ j j j j i i j¼1

j¼i

+ i1 m X X kþ1 k A j xj þ Aj xj  bÞ j¼1

j¼i

261

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

Summing the above inequality over all i ¼ 1; 2; . . . ; m and using the equality

Pm

 i¼1 Ai xi

¼ b, we have

* + m D m E X X k  k  kþ1  kþ1 Ai ðxi  xi Þ; k  k ¼ Ai xi  b; k  k i¼1

i¼1

2   m m m X  

1 1X 1 kþ1  X kþ1  2  kþ1 k P gi xkþ1  x þ x  x ; x  x A x  b þ   i i i i i i i i   2 i¼1 ls l i i¼1 i¼1 * + m m   X 1X þ Ai xkþ1  xi ; Aj xkj  Aj xkþ1 : i j

l i¼1

j¼i

The proof is completed. h Next, we find an upper bound of the quantity kwkþ1  w kH  kwk  w kH , which measures the progress made by the new iterate wkþ1 . With the help of Lemma 1, we can get the lemma below. Lemma 2. Let w ¼ ðx1 ; x2 ; . . . ; xm ; k Þ 2 W  . Then

2   m m X  kþ1   kþ1  1  X kþ1  2 w  w 2 6 kwk  w k2   c x  x  A x  b  ;  i i i H i i H  l  i¼1 i¼1

ð11Þ

2 where ci ¼ gi  3mþ1 l kAi k .

Proof. It follows from the last equality in (4) and (7) that

kk

kþ1

2   !2 * +  2    m m m X 1 X 1   X kþ1  k  k   k  kþ1 kþ1  k kl ¼ k  k  Ai xi  b  ¼ k  k   2 k  k ; A i xi  b þ  A i x i  b     l l i¼1 l  i¼1 i¼1  2

l

2    2 X m m m X

1 1 kþ1  X kþ1  k  kþ1  2  kþ1 k 6 k  k   gi kxi  xi k  2 x i  x i ; xi  x i   A i x i  b    l ls l i i¼1 i¼1 i¼1 * + m m X 2X  Ai xkþ1  Ai xi ; ðAj xkj  Aj xjkþ1 Þ : i

l i¼1

ð12Þ

j¼i

On the other hand, for any i,



2

lsi



2 kþ1 2 kþ1 xikþ1  xi ; xkþ1  xki ¼  xi  xi ; xikþ1  xi  hxi  xi ; xi  xki i i

lsi

6

2

lsi

lsi

kxkþ1  xi k2 þ i

2

lsi

jhxkþ1  xi ; xi  xki ij 6  i

1

lsi

kxkþ1  xi k2 þ i

1

lsi

kxki  xi k2

ð13Þ

and

* 2

Ai xikþ1



Ai xi ;

+ * + m m X X k kþ1 kþ1  k  ðAj xj  Aj xj Þ ¼ 2 Ai ðxi  xi Þ; ðAj xj  Aj xj Þ j¼i

*

j¼i

+ m X  2 Ai ðxikþ1  xi Þ; ðAj xj  Aj xjkþ1 Þ j¼i m D m D E E X X



 xi Þ; Aj ðxj  xjkþ1 Þ 6 2 Ai ðxikþ1  xi Þ; Aj ðxkj  xj Þ þ 2 Ai ðxkþ1 i j¼i

j¼i

m m X X 6 2ðm  i þ 1ÞkAi ðxikþ1  xi Þk2 þ kAj ðxjkþ1  xj Þk2 þ kAj ðxkj  xj Þk2 : j¼i

ð14Þ

j¼i

Summing up (13) for all i’s, we obtain

2

m X 1 i¼1

lsi

hxkþ1  xi ; xikþ1  xki i 6  i

m X 1 i¼1

lsi

kxkþ1  xi k2 þ i

m X 1 i¼1

lsi

kxki  xi k2 :

ð15Þ

262

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

Since m X m m X X kAj xjkþ1  Aj xj k2 ¼ ikAi xkþ1  Ai xi k2 i i¼1 j¼i

i¼1

and m X m m X X kAj xkj  Aj xj k2 ¼ ikAi xki  Ai xi k2 ; i¼1 j¼i

i¼1

summing up (14) for all i’s, we obtain

* + m m m m m X X X X X kþ1  k kþ1 2 A i xi  A i xi ; ðAj xj  Aj xj Þ 6 2 ðm  i þ 1ÞkAi ðxkþ1  xi Þk2 þ ikAi ðxikþ1  xi Þk2 þ ikAi ðxki  xi Þk2 i i¼1

j¼i

i¼1

i¼1

i¼1

m m m m X X X X ¼ ð2m  i þ 2ÞkAi ðxikþ1  xi Þk2 þ ikAi ðxki  xi Þk2 6 ð2m þ 1ÞkAi ðxkþ1  xi Þk2 þ m kAi ðxki  xi Þk2 : i i¼1

i¼1

i¼1

ð16Þ

i¼1

By substituting (15) and (16) into (12), we obtain

kkkþ1  k k2l þ

m m m m X X mX 1 mX 1 kAi ðxikþ1  xi Þk2 þ kxikþ1  xi k2 6 kkk  k k2l þ kAi ðxki  xi Þk2 þ kxki  xi k2

l

i¼1

i¼1

lsi

l

i¼1

i¼1

lsi

2   m m m m X 1 3m þ 1 X mX X kþ1 kþ1  2  gi kxi  xi k   Ai xi  b þ kAi ðxkþ1  xi Þk2 6 kkk  k k2l þ kA ðxk  xi Þk2 i   l i¼1 l i¼1 l i¼1 i i i¼1 2   m m m m X X 1 1 3m þ 1 X X kþ1 k  2 kþ1  2 þ kxi  xi k  gi kxi  xi k   Ai xi  b þ kA k2 kxkþ1  xi k2 ¼ kkk  k k2l i  lsi l  i¼1 l i¼1 i i¼1 i¼1 2    m m m  m X X mX 1 3m þ 1 1  X þ kAi ðxki  xi Þk2 þ kxki  xi k2  gi  kAi k2 kxikþ1  xi k2   Ai xkþ1  b  : i   l i¼1 ls l l i i¼1 i¼1 i¼1 where the second inequality follows from kAi ðxikþ1  xi Þk2 6 kAi k2 kxikþ1  xi k2 .

h

With the help of the preceding two lemmas, we can establish the convergence of the extended APGM (4) for solving (1) with strongly convex fi ’s. Theorem 3. Let fi ’s in (1) be strongly convex with the modulus gi ’s. For any

( ) ð3m þ 1ÞkAi k2 ; l > max

gi

16i6m

the sequence fwk g generated by the APGM (4) converges to a solution of (6). Proof. Since

2

2 ik l > max fð3mþ1ÞkA g, ci :¼ gi  3mþ1 gi l kAi k > 0. It following from (11) that 16i6m

kwkþ1  w k2H 6 kwk  w k2H 6    6 kw0  w k2H < þ1:

ð17Þ

Taking (11) into account, we have

2 2 3   X 1 m m 1 X X X 2  2 kþ1  kþ1 4 ci kx  x k þ  Ai x  b ðkwk  w kH  kwkþ1  w kH Þ < þ1:  56 i i i   l i¼1 k¼0 i¼1 k¼0 Thus

lim kxikþ1  xi k ¼ 0;

k!1

for all i ¼ 1; . . . ; m;

   X m   kþ1 lim  Ai xi  b ¼ 0:   k!1 i¼1

ð18Þ

ð19Þ

Taking (17) and (18) into account, it yields that the sequence fkkk  k k2l g converges. Without loss of generality, we take the k k and limkj !1 ni j ¼ ni in view of the boundness of fkk g and fnki g, where subsequences fkkj g and fnkj g such that limkj !1 kkj ¼  nki 2 @fi ðxki Þ. Taking limit along this subsequence in (8) and (19), we obtain

M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263

ni 2 @fi ðxi Þ; hxi 0  xi ; ni  AT ki P 0;

8xi 0 2 X i and

263

m X Ai xi ¼ b; i¼1

which mean that ðx1 ; x2 ; . . . ; xm ;  kÞ is a saddle point of the Lagrange function (5). Since k is an arbitrary Lagrange multiplier  in (17) and conclude that the whole generated sequence fkk g converges to corresponding to ðx1 ; x2 ; . . . ; xm Þ , we can set k ¼ k  This completes the proof. h k.

4. Conclusion We consider the linearly constrained separable convex programming, whose objective function is separable into m ðP 3Þ individual convex functions without coupled variables. The alternating proximal gradient method is an effective method for the case m ¼ 2, but it is unknown whether its convergence can be extended to the general case m P 3. In this paper, we show the global convergence of the alternating proximal gradient method for minimizing the sum of any number of strongly convex separable functions. Acknowledgments This work was supported by the National Science Foundation of China (Grant No. 61179033), and Doctoral Fund of Innovation of Beijing University of Technology. References [1] D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximations, Comput. Math. Appl. 2 (1976) 17–40. [2] D. Gabay, Applications of the method of multipliers to variational inequalities, in: M. Fortin, R. Glowinski (Eds.), Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, North-Holland, Amsterdam, 1983, pp. 299–331. [3] D.R. Han, X.M. Yuan, A Note on the Alternating Direction Method of Multipliers, J. Optim. Theory Appl. 155 (2012) 227–238. [4] B.S. He, M. Tao, X.M. Yuan, Alternating direction method with Gaussian back substitution for separable convex programming, SIAM J. Optim. 22 (2012) 313–340. [5] M.Y. Hong, Z.Q. Luo, On the linear convergence of the alternating direction method of multipliers, Arxiv preprint arxiv:1208.3922 (2012). [6] S.Q. Ma, Alternating Proximal Gradient Method for Convex Minimization, Preprint (2012). [7] Y.G. Peng, A. Ganesh, J. Wright, W.L. Xu, Y. Ma, RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images, IEEE Trans. Pattern Anal. Mach. Intell. 34 (11) (2012) 2233–2246. [8] M. Tao, X.M. Yuan, Recovering low-rank and sparse components of matrices from incomplete and noisy observations, SIAM J. Optim. 21 (2011) 57–81.